Query using tiledbsoma
¶
The first guide showed how to query for AnnData
objects.
This guide queries “Census”, i.e., a tiledbsoma
array store that concatenates many AnnData
objects.
Load your LaminDB instance for quering data:
!lamin load laminlabs/cellxgene
→ connected lamindb: laminlabs/cellxgene
import lamindb as ln
import bionty as bt
import tiledbsoma
census_version = "2024-07-01"
→ connected lamindb: laminlabs/cellxgene
Query data¶
Create look ups so that we can auto-complete valid values:
features = ln.Feature.lookup(return_field="name")
assays = bt.ExperimentalFactor.lookup(return_field="name")
cell_types = bt.CellType.lookup(return_field="name")
tissues = bt.Tissue.lookup(return_field="name")
ulabels = ln.ULabel.lookup()
suspension_types = ulabels.is_suspension_type.children.all().lookup(return_field="name")
Create a query expression for a tiledbsoma
array store.
value_filter = (
f'{features.tissue} == "{tissues.brain}" and {features.cell_type} in'
f' ["{cell_types.microglial_cell}", "{cell_types.neuron}"] and'
f' {features.suspension_type} == "{suspension_types.cell}" and {features.assay} =='
f' "{assays.ln_10x_3_v3}"'
)
value_filter
'tissue == "brain" and cell_type in ["microglial cell", "neuron"] and suspension_type == "cell" and assay == "10x 3\' v3"'
Query for the tiledbsoma
array store that contains all concatenated expression data.
census = ln.Artifact.filter(description=f"Census {census_version}").one()
Query slices within the array store. (This will run a lot faster from within the AWS us-west-2
data center.)
human = "homo_sapiens" # subset to human data
# open the array store for queries
with census.open() as store:
# read SOMADataFrame as a slice
cell_metadata = store["census_data"][human].obs.read(value_filter=value_filter)
# concatenate results to pyarrow.Table
cell_metadata = cell_metadata.concat()
# convert to pandas.DataFrame
cell_metadata = cell_metadata.to_pandas()
cell_metadata.shape
Show code cell output
(66418, 28)
cell_metadata.head()
Show code cell output
soma_joinid | dataset_id | assay | assay_ontology_term_id | cell_type | cell_type_ontology_term_id | development_stage | development_stage_ontology_term_id | disease | disease_ontology_term_id | ... | tissue | tissue_ontology_term_id | tissue_type | tissue_general | tissue_general_ontology_term_id | raw_sum | nnz | raw_mean_nnz | raw_variance_nnz | n_measured_vars | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 48182177 | c888b684-6c51-431f-972a-6c963044cef0 | 10x 3' v3 | EFO:0009922 | microglial cell | CL:0000129 | 68-year-old human stage | HsapDv:0000162 | glioblastoma | MONDO:0018177 | ... | brain | UBERON:0000955 | tissue | brain | UBERON:0000955 | 15204.0 | 3959 | 3.840364 | 209.374207 | 27229 |
1 | 48182178 | c888b684-6c51-431f-972a-6c963044cef0 | 10x 3' v3 | EFO:0009922 | microglial cell | CL:0000129 | 68-year-old human stage | HsapDv:0000162 | glioblastoma | MONDO:0018177 | ... | brain | UBERON:0000955 | tissue | brain | UBERON:0000955 | 39230.0 | 5885 | 6.666100 | 875.502870 | 27229 |
2 | 48182185 | c888b684-6c51-431f-972a-6c963044cef0 | 10x 3' v3 | EFO:0009922 | microglial cell | CL:0000129 | 68-year-old human stage | HsapDv:0000162 | glioblastoma | MONDO:0018177 | ... | brain | UBERON:0000955 | tissue | brain | UBERON:0000955 | 9576.0 | 2738 | 3.497443 | 121.333753 | 27229 |
3 | 48182187 | c888b684-6c51-431f-972a-6c963044cef0 | 10x 3' v3 | EFO:0009922 | microglial cell | CL:0000129 | 68-year-old human stage | HsapDv:0000162 | glioblastoma | MONDO:0018177 | ... | brain | UBERON:0000955 | tissue | brain | UBERON:0000955 | 19374.0 | 4096 | 4.729980 | 464.331956 | 27229 |
4 | 48182188 | c888b684-6c51-431f-972a-6c963044cef0 | 10x 3' v3 | EFO:0009922 | microglial cell | CL:0000129 | 68-year-old human stage | HsapDv:0000162 | glioblastoma | MONDO:0018177 | ... | brain | UBERON:0000955 | tissue | brain | UBERON:0000955 | 8466.0 | 2477 | 3.417844 | 162.555950 | 27229 |
5 rows × 28 columns
Create an AnnData
¶
with census.open() as store:
experiment = store["census_data"][human]
adata = experiment.axis_query(
"RNA",
obs_query=tiledbsoma.AxisQuery(value_filter=value_filter)
).to_anndata(
X_name="raw",
column_names={
"obs": [
features.assay,
features.cell_type,
features.tissue,
features.disease,
features.suspension_type,
]
}
)
adata.var = adata.var.set_index("feature_id")
adata
Show code cell output
AnnData object with n_obs × n_vars = 66418 × 60530
obs: 'assay', 'cell_type', 'tissue', 'disease', 'suspension_type'
var: 'soma_joinid', 'feature_name', 'feature_length', 'nnz', 'n_measured_obs'
adata.var.head()
Show code cell output
soma_joinid | feature_name | feature_length | nnz | n_measured_obs | |
---|---|---|---|---|---|
feature_id | |||||
ENSG00000000003 | 0 | TSPAN6 | 4530 | 4530448 | 73855064 |
ENSG00000000005 | 1 | TNMD | 1476 | 236059 | 61201828 |
ENSG00000000419 | 2 | DPM1 | 9276 | 17576462 | 74159149 |
ENSG00000000457 | 3 | SCYL3 | 6883 | 9117322 | 73988868 |
ENSG00000000460 | 4 | C1orf112 | 5970 | 6287794 | 73636201 |
adata.obs.head()
Show code cell output
assay | cell_type | tissue | disease | suspension_type | |
---|---|---|---|---|---|
0 | 10x 3' v3 | microglial cell | brain | glioblastoma | cell |
1 | 10x 3' v3 | microglial cell | brain | glioblastoma | cell |
2 | 10x 3' v3 | microglial cell | brain | glioblastoma | cell |
3 | 10x 3' v3 | microglial cell | brain | glioblastoma | cell |
4 | 10x 3' v3 | microglial cell | brain | glioblastoma | cell |