Introduction

Rigorous definitions of cell identity are a sign of field maturity, since they rely on a detailed understanding of connections between cellular phenotype, function, and developmental origins. Cells are the basic unit for organisms, and ways in which they differentiate, become arranged into complex structures or systems, and maintain or shift their function throughout life are processes that are understood incompletely and to different degrees across organ systems. Historically, cells have been defined by features such as morphology, location, and interactions with other cell types. In immunology, advances in multidimensional analyses during the 1970s launched a revolution in understanding the function and origin of immune cells that revealed a spectrum of distinct, highly specialized cell types [1]. Immunology progress was driven in part by access to single cell suspensions and the observation that morphologically similar lymphocyte populations contained cell subsets with highly contrasting functions that could be distinguished by surface antigens. These facts drove the creation of quantitative, single-cell identification systems, including the clusters of differentiation (CD) marker system [2], and closely linked advances in analytical cytometry and fluorescence activated cell sorting (FACS) to advances in immunology [3]. Today, cell type definitions typically include some combination of phenotype, lineage origin, fate potential, and capacity to perform a key function in the future. In immunology, the term “polarization” is used to describe stable, reversible sets of cell states that, with the appropriate signal in the corresponding context, can be switched between contrasting functions [4]. This is similar to the term “plasticity” in neuroscience [5] and cancer biology [6], where the boundary between cell state and cell function is also not well defined. There is now also strong evidence such plasticity exists for both blood and tissue macrophages, which further supports the idea that dysfunctional cells might be reconditioned in settings of an injury or a disease [7]. Other examples of cell states could include signaling ability [8], proliferation or quiescence status, and being memory or naïve, all of which are key refinements to the concept of an identity that can distinguish dramatically different subtypes of cells within an otherwise shared lineage. Critically, the majority of cell states are thought to be encoded in proteins and their posttranslational modifications and are not directly detectable in DNA or RNA sequence reads [9]. A cell’s gene expression program is thus a window into the cell’s potential [10]; however, disconnects between RNA and protein raise the concern that a snapshot of the transcriptome may not reflect a cell’s current functional identity.

Individually, techniques such as histological and morphological assessments, genome-wide profiling, and epigenomic, transcriptomic, proteomic, and metabolomic analyses have been vital tools for defining cell identity [11]. While these techniques have progressed from bulk analysis of sorted or enriched cell populations to true analysis of individual cells, major differences remain in their practicality for single-cell analysis of primary tumors [9]. Furthermore, no one method is widely accepted across fields as sufficient to define a given single cell’s identity. The jargon of field-specific cell definitions also presents a barrier to harmonizing cell classifications across research areas, diseases, and tissue types. This is especially true in tumor microenvironments where complex mixtures of cell types with abnormal functional identities are observed. Additionally, in cancer biology, the goal is frequently to prove a functional identity (viz., “stem,” “malignant,” “suppressor”) that requires study and testing of living cells. The plasticity of mature cells in response to environmental cues and stimuli can lead to changes in cell state and even identity. While studying these processes is especially relevant for understanding cancer [12], it is also challenging to maintain cells in research systems without disrupting the exact processes to be measured. Live cells, whether in a tumor or in a research lab, change over time, especially when the environmental context around them is altered. In addition to epigenetic changes, selective pressures can enrich for mutations in regulatory genes that lead to a shift in cell identity and contribute to cellular reprogramming, as commonly observed in cancer. Thus, oncogenesis might be thought of as a process whereby cells gain the ability to change their identity. For a malignant cell, the advantages of such plasticity include fate flexibility, which creates a diverse pool of cells that can withstand a range of treatments or immune responses, and useful new functions such as stem/progenitor self-renewal abilities [13, 14].

Addressing challenges in stem and cancer cell biology

Links between plasticity of cell identity and malignant transformation have led to widely used terms like “cancer stem cell” (CSC), a term that means very different things in different research contexts. For example, CSCs were originally proposed as a concept to explain clinical observations like therapy evasion [15], but CSCs can also be an allusion to a proposed origin of the cancer from a stem or progenitor cell [16, 17], and CSCs can be used as a name for a cell subtype that is capable of transferring malignancy or repopulating multiple tumor cell types [18, 19]. In brain tumors, multiple proteins have been proposed as defining markers of glioma stem cells, including CD133 [18], SOX2 [20, 21], NESTIN [22], EGFR [23], and CD15 [24] A simultaneous analysis of these proteins reveals that there are multiple subpopulations of glioma stem cells present within individual tumors, and helps to resolve which of these cell subsets are associated with clinical outcomes [25]. Field-specific differences in cell identity definitions will be discussed more below, but as a starting point the authors suggest, as a minimum best practice, that all fields use the cytometry hallmark of formally defining the cell identification system the first time a cell label is used (for the immunologists: show your gates). For example, for the term “helper T cell,” there are multiple ways to identify and even isolate this functional cell group, and immunologists generally are required to “show their work” the first time they mention a cell type by displaying the exact order and set of proteins used to include or exclude cells from that definition label. To go beyond this, we also recommend a table, label, or plot of summary statistics (e.g., heatmap) for all measured features to be reported. As an example, marker enrichment modeling (MEM) labels provide human- and machine-readable reports of enrichment on a 10-point scale [26]. The original goal of the MEM label was to develop an automated, quantitative version of the immunology practice whereby an isolated cell subpopulation is described with a string of observed protein expression values (e.g., CD34hi CD38lo/− as a defining label for blood stem cells or CD3 + CD4 + CD8- FOXP3 + as defining label for regulatory T cells). The MEM algorithm calculates a label based on data and scales values to a 10-point scale, generating a label like “CD44+8 CD38+8 CD8+7 ICOS+7 CD45+6 CD45R0+6 CD26 + 6 PD-1+6…”. This machine-generated label can be used to identify the population as CD38hiICOShi memory CD8 T cells; this example label described the main population of SARS-CoV-2-specific T cells induced by RNA vaccination [27]. Quantitative labels like MEM can be used to compare phenotypes across analysis runs, experiments, or instrument platforms and have been applied in brain tumors to define cells both by total protein and phospho-protein state [25].

Cancer research has focused on the cellular and molecular characterizations of bulk tumor masses in the past. However, there is strong evidence that many tumors, including glioblastoma, are composed of heterogeneous cell types. It remains unclear whether this heterogeneity is strictly genetic or if it arises from a cellular hierarchy of growth and differentiation within the tumor that allows for cells to employ new and diverse cell identity programs [28, 29]. For example, the disruption of pathways regulating self-renewal and differentiation through the acquisition of transforming mutations in leukemia generates leukemic stem cells that possess an altered differentiation program. This was demonstrated by aberrant expression of surface markers and ability to give rise to an altered developmental hierarchy that retained aspects of its normal counterpart [30]. To further dissect if this mechanism also holds true across many tumors, it will be valuable to conduct functional experiments with subpopulations of viable cells isolated from tumors that have been rigorously identified by quantitative cytometry.

Historical and modern best practices

Both immune and neural cells are well characterized based on phenotype, structure, and function. Tumor cells, however, are often found in intermediate differentiation steps that coopt different phenotypes and functions, enabling them to survive and proliferate even in unfavorable conditions. Tumors can achieve complexity at the level of organs or tissues with dynamic regulation, organization of immune structures, and support of other nonmalignant cellular populations and structures during tumor initiation, maintenance, and progression. Here, we propose a combination of approaches from the fields of immunology and neuroscience that may be helpful for characterizing and aligning glioblastoma cell subsets to their nearest neural cell cognate (Fig. 1).

Fig. 1
figure 1

Phenotypic identifiers of cells in the neural lineage. A diagram of human neural cell identities is shown in the style of hematopoietic immune cell identity maps. Bold labels indicate neural cell types. Protein name labels highlight markers that are proposed to define neural cell identities, including cell surface proteins (red), transcription factors (blue), and other intracellular proteins (green). Lines connect multipotent stem and progenitor cells (top) with the neural lineage cell types they can produce (middle and bottom). This diagram attempts to be cell-intrinsic and to highlight surface marker sets that distinguish each major cell type, as in traditional immunology views of cell identity where FACS separation of live cells can be used for functional testing. Thus, the diagram does not explicitly consider other features that are classically important to understanding neural cell identity, such as morphology, location, and structure [31,32,33]. Notably, radial glial progenitors, a type of neural stem cell, are conceptually separated from adult neural stem cells both by a key difference in potency (only radial glial progenitors are normally thought to produce ependymal cells) and by developmental time (radial glial progenitors are only observed prenatally in humans)

Historically, the hematopoietic system has been used as a paradigm to illustrate a developmental hierarchy of cells that is sustained by a population of long-lived, quiescent, pluripotent stem cells capable of self-renewal and contributing to the replenishment of a spectrum of mature immune cell types [34]. The discovery and characterization of hematopoietic stem cells (HSCs) relied heavily on the ability to identify and purify such cells [35], as these cells were first defined by their functional ability to regenerate all hematopoietic lineages in vivo [36]. Repopulation assays using hematopoietic stem cells have since been performed to help investigate heterogeneity in HSCs [37]. This highlights how functional assays played a major role in the identification of different immune cells [38, 39]. Table 1 summarizes and highlights different ways in which the field of immunology established cell identity definitions across cells of a hematopoietic lineage. The establishment of the CD marker system was central in reshaping the way immune cells are identified, in this case based on the molecules expressed on their surface [40]. CD molecules are now routinely used as definitional cell markers, allowing for the scoring of the presence and proportions of specific leukocyte cell subsets. For example, CD45, also known as the leukocyte common antigen, is a receptor-linked protein tyrosine phosphatase that is expressed on nucleated cells of a hematopoietic lineage and can be used to distinguish immune cells from other cell types [41]. As with cell surface signaling proteins, transcription factor proteins can greatly influence the molecular content and function of cells. Measuring the expression of a specific transcription factor can provide information about the state of a cell and how it is likely to respond to signaling cues or regulate the expression of important functional proteins. For example, T cells can functionally be defined by the expression of CD3, a signaling subunit of the T cell antigen receptor [42]; the expression of transcription factors such as FOXP3 can give additional context and suggest a functional identity as a regulatory T cell [43]. Distinct patterns of transcription factor expression are also illustrated across B cell maturation. While all B cells express CD19, a defining B lymphocyte antigen, B cells express different transcription factors such as PAX5, BCL6, and BLIMP throughout development and maturity. These transcription factors mark key B cell maturation events including commitment to the B lineage in early B cells (PAX5) [44, 45], encountering antigen, receiving T cell help, becoming a germinal center B cell (BCL6) [46], and specializing into a plasmablast (BLIMP1) [47] that will ultimately turn off much of the B cell program and generate an antibody-producing plasma cell. It has been shown that the activation of transcription factors not only marks cell maturation/differentiation, but activation of certain transcription factors can also lead to the dedifferentiation of hematopoietic cells [48]. Lineage inference techniques including single cell RNA sequencing have also suggested new models of cellular development or markers identifying transitional states [49]. Critically, inferred lineages based on protein expression pattern must be tested experimentally, as these approaches can misclassify cells when expression programs follow an off–on-off type of pattern where many genes are coordinated together, as with lymphocytes.

Table 1 Immunology and neurobiology examples of class and modern cell identity definitions

Use of brain anatomy in neural cell identity

Neural cell identity characterization is based on both classic features such as cell shape/morphology, physiological location, and structure, as well as per-cell measurement of RNA or protein [54]. Table 1 reviews some of the classic ways of identifying cells and how they have been applied to neural cell identity. Given the complex architecture of the human brain, the effort to categorize neural stem cells and their progeny has focused extensively on their location across developmental time and space [31]. There are two main germinal structures where a series of distinct stages of neural progeny maturation have been well characterized: the ventricular-subventricular zone (V-SVZ) lining the lateral ventricles and the subgranular zone (SGZ) in the dentate gyrus of the hippocampus [82, 83]. Here, we have chosen to focus on the larger of the two niches, the V-SVZ, and its developmental antecedents, radial glia, as an example case of neural stem cells [84]. Radial glia are essential neural and glial progenitor cells in the prenatal brain. Their hallmark radial process serves as a physical guide for migrating neurons during the structural development of the brain [85]. The generation of radial glial cells is marked by the expression of several intermediate filament proteins including nestin and vimentin, which are known stem cell markers [86]. However, neural stem cells and their progeny highlight one of the major challenges of characterizing neural cell identity using protein expression markers as the hierarchy of stem, progenitor, and differentiated neural cells contains many areas of exception and overlap. For example, the simple category of “radial glia” in the prenatal brain has, of late, been expanded and subdivided as different subclasses have been found across developmental stages and species [87, 88]. Adult neural stem cells in the SVZ express glial fibrillary acidic protein (GFAP), which is also a historically well-established marker of most astrocytes [29, 72], and a subset of quiescent neural stem cells lacks nestin [59]. Similarly, ependymal cells express other markers, such as CD133/prominin-1, which are also seen on neural stem cells. However, ependymal cells have historically been considered separately from other neural original cells, since ependymal cells are multiciliated, contiguous with choroid plexus epithelial cells, and form a monolayered barrier between the V-SVZ and the ventricle lumen [59, 89]. Thus, ependymal cells arise from a neural stem cell and provide epithelial functions. Another example of this apparent disconnect between function and lineage is seen in pulmonary neuroendocrine cells, which arise from an epithelial cell and provide neuroendocrine functions [90]. Thus, terms like “epithelial” should be clearly defined as referring to a current functional identity or a prior lineage or tissue origin.

Most neural cell types are functionally characterized and have a specific assay that is considered definitional. For example, neural stem cells are defined in vivo by their ability to self-renew and give rise to neural and glial progeny [91]. Stem-ness features are commonly tested in vitro using neurosphere assays, although extensive evidence has shown that the culture conditions for such assays can change the assay’s outcome [92,93,94]. For example, detection of long-term stem cells that are quiescent in vivo is especially challenging [59]. Astrocytes are the most abundant cell type in the brain and help regulate axons and blood flow and maintain homeostasis [95, 96]. Oligodendrocytes myelinate cells and regulate neuronal activities [97]. Neurons are electrically excitable, and their primary function is to relay electrochemical signals to, from, and within the brain [75]. Much effort has been spent to identify definitive markers of neural identity that will help distinguish them from other neural cells and further study them in the context of cancer. An extensive transcriptome analysis of neural cells has brought greater understanding of each cell type and their gene expression programs [98]. Transiently expressed transcription factors have been identified as definitional markers of cell identity, but transcription factors may be expressed in different cell subtypes across different stages of cell development or be expressed in an oscillatory fashion, complicating the interpretation of a single transcriptional snapshot [99]. For example, SOX2 is a well-established but not exclusive functional marker of neural stem cells, while PAX6 can be expressed both in neural stem cells as well as intermediate cell types like neural progenitors during neuronal differentiation [63, 66]. Lastly, immunophenotyping screens have aided in the identification of potential cell surface signatures of neural cells [32, 59, 72]; however, a better understanding of CD marker expression would help bridge the gap between descriptive and functional single cell analyses for neural origin cells.

Ultimately, while we note the immense amount of work that has been put toward characterizing neural cell identity, we think it is important that we continue to link more routine measurements of protein and RNA transcripts to critical functional determinants of neural cell identity, a gap noted by others in the field [100]. However, to achieve this, several challenges must be overcome. Currently, the most common source of healthy human brain tissue available for research is limited to formalin fixed paraffin embedded (FFPE) or fixed frozen tissues. Thus, it is especially hard to study and assess changes in signaling, metabolism, function, and overall state across developmental times in human due to the lack of living cells available for experimentation. One way in which neuroscientists have attempted to overcome this challenge is by using animal models including rodents, ferrets, pigs, and, in a few cases, primates. However, the use of non-murine organisms significantly raises the monetary and temporal cost of the research. The rise of organoid-based model systems in neuroscience has also been rapid. However, the field of human organoids has challenges, including 1) known ground truth in vivo in human tissue has not been well established enough to validate the organoid models, 2) most organoids lack tissue resident immune cells (e.g., microglia in the brain) that are increasingly understood to be critical to normal function of non-immune organ cells, and 3) organoids can take months to generate and the very heterogeneity that makes them outstanding models means that many more examples must be studied than in genetically homogeneous animal models or cell lines. However, organoid research is likely essential given that the rodent brain is a suboptimal comparator to the human brain, both at a molecular cell biology level and an anatomic level. Immunology, by contrast, had a rapid start as healthy human blood was more widely available and ethically reasonable to collect across most stages of human development. However, immunology is now encountering the same challenge as the field seeks to understand the role of immune cells in tissues, including tissue resident immune cells of non-hematopoietic origin.

Is “cell type” different from “cell state”?

It has been proposed that there are three pillars central to the concept of cell identity: lineage, phenotype (which here includes function), and cell state [101]. For example, a regulatory helper T cell might be of the hematopoietic stem cell and T lymphocyte lineages, might currently express proteins like CD3, CD4, and FOXP3, and could be in the states of actively signaling via its T cell antigen receptor and in the G1 phase of the cell cycle. We propose here that the borders between these concepts of lineage, phenotype, and state are not well defined at a chemical or temporal level and the concepts may thus have overlapping domains. In particular, the boundary for when a feature is considered to mark a distinct cell type (an identity) versus a state, which exists within an identity and a lineage, could be much better defined. Furthermore, while a given cell type as a population might be expected to express a set of genes, individual cells vary from their population’s statistical norm. If the RNA transcripts of two cells have detectably different levels of different transcripts, does this indicate that they are of different cell types or could they be of the same cell type and in a different state? Cells exist in flux across a spectrum of states, including the cell cycle, reversible transitions like metabolic programming and mTOR pathway activity, flux of ions like calcium, redox states including production of species like H2O2, and activity of phospho-protein-driven signaling networks that control the function of identity-defining transcription factors and other proteins. To what extent can a cell deviate from its population’s norm and still be considered a member of that group?

Understanding how to define a normal cell’s states is particularly important for identifying when a cell travels out of normal physiological bounds into a pathological state. A prime example is seen in oncogenesis, and much attention has been paid to considering the boundary between healthy cells and malignant cells. Cells that exist in liminal spaces between cell identity groups or with the potential to shift into multiple identities are especially important to understand fields where we seek to trigger or prevent specific cell identity changes, such as regenerative medicine and cancer biology. The degree to which a cell is biochemically constrained or encouraged to explore different identities, i.e., its intrinsic plasticity in identity, maybe a critical piece of information for understanding stem cells, immune evolution, and cancer. In thinking about factors that influence a cell’s state, the tissue context is critical. The tissue environment includes inputs that can alter signaling and metabolism, and broadly dictates a cell’s functional capacities. Thus, mapping specific cell states to their corresponding cell identities might help define the changes seen in cancer or other disease contexts (explored for glioblastoma in Table 2). A leading example of this is seen in measurements of phospho-proteins, which are now well established as superior markers of clinically relevant blood cancer cells [8, 102,103,104,105], a finding recently extended to identify risk stratifying cells by signaling in glioblastoma [25], surgical recovery [106], and pregnancy [107]. It will likely also be critical to understand cell state identities to ensure the reliable performance of cell-based therapies in which functions such as cytokine production, proliferation, and cell killing are critical to their function and likely defined by cell state identity.

Table 2 Cell identity approaches used in glioblastoma brain tumors

One persistent challenge forecast by studies discussed above is defining the differences between a stem cell, a cancer cell, and a cancer stem cell, and inferring a possible lineage, when these cell types are detected. Broadly speaking, it is difficult to determine whether a CSC is a cancer cell that has shifted its identity into a stem-like cell as a mechanism for more favorable survival, or rather a cell that arises from a population of healthy stem cells that have transformed into a malignant state [28]. When evaluating the existence of CSCs, it is important to keep in mind their potential for differentiation or plasticity, including the reemergence of states that resemble cells normally seen in earlier development [114]. Subsets of cells with different phenotypes are observed within and between tumors from different patients, and only some of these cell subsets will behave as CSCs in functional assays [17, 115,116,117]. Such cells are often able to undergo genetic changes that make it difficult to establish a standardized set of markers that would canonically define them. CSCs were first described in acute myeloid leukemia [118] and were later shown to be present in solid tumors including glioblastoma [18], where they are thought to contribute to therapy resistance and drive tumor growth [119]. CSCs have been functionally defined by assays such as xenotransplantation to assess their ability to self-renew and differentiate like stem cells [18, 103, 120]. However, relying solely on such functional assays to evaluate these cells makes it quite difficult to study them further considering the large number of resources required to validate one subset of cells. Although currently there are no individual markers that exclusively or definitively mark CSCs for all patients, markers including CD133, CD44, and CD15 have been useful in prospectively isolating or enriching some subsets of glioblastoma cancer stem cells (GSCs) [109]. Unfortunately, these markers are not exclusively expressed in GSCs. Using single cell technologies including mass cytometry, researchers can continue to characterize and further define markers that identify these cells and different transient states that are associated with them. By assessing a variety of surface markers, transcription factors, signaling molecules, and metabolic markers, profiles that will identify such cells from within heterogenous populations can be established. One intriguing recent example used functional features (uptake and cross-cell transfer of specific dyes) to distinguish glioblastoma cells that are or are not enmeshed in a gap junction-coupled network, and then determined the transcriptional and invasive features of each subgroup [121]. Such profiles will enable the targeting and directing of cells into more favorable states or identities in the context of cancer.

Role of technology

Immunology largely owes its present status to multidimensional single cell analysis using cytometers [1], and this technology has long driven refinements in concepts of cell identity [3], beginning with the ability to prospectively sort marker-defined populations of cells for bulk analysis of transcripts and/or genomes. These initial approaches, while offering improved resolution of abundant or antigenically and functionally distinct cell populations, may not have been powered to identify fine differences within subsets cells, such as cell state identities. Increases in dimensionality, throughput, and intracellular detection abilities in single cell technologies have driven a tremendous expansion in the mapping of cell lineages and trajectories that span a continuum between stem/progenitor cells and fully differentiated progeny.

While a comprehensive review of technologies is beyond the scope of this review, it is worth noting the current state of the art in some key single cell and cytometry tools that have driven recent discoveries. Each technology presents strengths and weaknesses that should be considered in experimental design. Single-cell RNA sequencing and its relatives have the ability to examine thousands of parameters (transcripts) but currently has a much higher cost and lower per-feature dynamic range than antibody-based imaging or suspension cytometry [9], and sequencing-based detection especially suffers from signal dropouts where present molecules are not measured in cells, leading to the artificial appearance of heterogeneity even for commonly expressed molecules [52]. Flow cytometry has enabled the analysis of small panels of proteins/markers in individual single cells, and FACS-based isolation has long been employed for the functional and molecular profiling of heterogeneous cell populations [11]. Suspension cytometry, including mass cytometry, spectral flow cytometry, and fluorescence flow cytometry, lack spatial and cytoarchitectural information but are able to quantify protein expression and posttranslational modifications in a sufficient number of cells to enable the detection of rare populations of cells, such as quiescent stem cells, while also including additional parameters that identify cell function and phenotype. Single cell approaches including spectral flow cytometry and mass cytometry offer greater resolution and more information per cell than conventional fluorescence cytometry [9, 50, 53, 122]. Multiplex imaging allows for morphology assessment, tissue structure, provides spatial resolution and subcellular resolution like the location of protein expression within a cell, and can be multiplexed to measure multiple proteins [123,124,125,126], whereas, subcellular imaging techniques, such as super-resolution imaging, lack the throughput and higher dimensionality that existing multiplex fluorescence or mass imaging can provide. While multiplex imaging allows for sub-cellular comparison of co-expression, it generally relies on fixed tissues. Thus, the vast majority of tissue collection does not include live cells that can be used in functional experiments. Furthermore, the analysis for high dimensional imaging can be a computationally intensive process [127].

Historically, genome-wide profiling of single cells has enabled the unbiased exploration of cell identity, allowing for the discovery of possible known and unknown cell types at single-cell resolution. Yet inferring the identity of cells has become a renewed challenge as the expanding breadth and depth of single-cell data can now provide an unprecedented lens into the complexities and nuances of cell identities [128]. As an example, mass cytometry has recently been developed for studies on the nervous system, and a central finding of this work is that RNA transcript and protein expression do not align well in all cases [60]. Single cell technologies will continue to improve and allow for us to better understand the cellular and molecular processes that contribute to tumorigenesis and the tumor microenvironment and develop novel therapies and delivery mechanisms that will help treat refractory tumors.

Concluding thoughts

Dissecting the functional identity of cells in tissues is a central goal of cell biology, and a central goal of modern cytometry is to enable automated cell identification. However, distinct fields have different rules and conventions surrounding what distinguishes key cell subtypes, whose features are definitional and whose functional tests are the ne plus ultra for each cell type. On one end of the spectrum is a cell like the T lymphocyte, which is distinct in DNA sequence, transcript, protein expression, and function, although it is largely lacking in distinguishing morphology. On the other end might be subtypes of neurons that were described largely by position and function, such as neurotransmitter responsiveness or calcium signaling patterns, but which lack a strict, known protein or DNA sequence identity.

Perhaps a helpful thought experiment would be to imagine we are a computer algorithm whose job is to correctly identify cells. What information would this algorithm need to be satisfied, and what is the level of confidence it needs? This quickly leads to a challenge: biologists have invented diverse systems to say when a given cell has shifted from one identity to another. In which cases might the algorithm be confident of a cell’s function without measuring that function? In this review, the goal was to highlight useful aspects of measuring diverse cellular features, from easily detectable surface markers to functions that must be measured over time in living cells. We hope this has especially brought out the usefulness of measuring surrogates of identity and developing realistic model systems, as well as the value of working with living human cells from primary tissues. This system of defining cell identity is urgently needed in the study of cellular diseases, especially cancers and neurodegenerative diseases, as it is crucial to distinguishing abnormal cell functions from healthy ones misplaced in space or developmental time.