← Início

Overview

Brought to you by YData

Dataset statistics

Number of variables11
Number of observations94
Missing cells68
Missing cells (%)6.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory69.6 KiB
Average record size in memory757.7 B

Variable types

Text1
Categorical10

Alerts

SITE_SUBTYPE_3 has constant value "NS" Constant
PRIMARY_HISTOLOGY has constant value "gastrointestinal_stromal_tumour" Constant
HISTOLOGY_SUBTYPE_3 has constant value "NS" Constant
EFO is highly overall correlated with HISTOLOGY_SUBTYPE_1 and 2 other fieldsHigh correlation
HISTOLOGY_SUBTYPE_1 is highly overall correlated with EFO and 1 other fieldsHigh correlation
HISTOLOGY_SUBTYPE_2 is highly overall correlated with EFOHigh correlation
NCI_CODE is highly overall correlated with EFO and 3 other fieldsHigh correlation
PRIMARY_SITE is highly overall correlated with NCI_CODEHigh correlation
SITE_SUBTYPE_1 is highly overall correlated with NCI_CODEHigh correlation
PRIMARY_SITE is highly imbalanced (91.5%) Imbalance
SITE_SUBTYPE_1 is highly imbalanced (91.5%) Imbalance
HISTOLOGY_SUBTYPE_2 is highly imbalanced (71.8%) Imbalance
EFO is highly imbalanced (60.9%) Imbalance
EFO has 68 (72.3%) missing values Missing
COSMIC_PHENOTYPE_ID has unique values Unique

Reproduction

Analysis started2025-07-15 00:43:24.358706
Analysis finished2025-07-15 00:43:25.005719
Duration0.65 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

COSMIC_PHENOTYPE_ID
Text

Unique 

Distinct94
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
2025-07-15T00:43:25.169937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length13
Median length12
Mean length12.202128
Min length12

Characters and Unicode

Total characters1147
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique94 ?
Unique (%)100.0%

Sample

1st rowCOSO318721653
2nd rowCOSO97595385
3rd rowCOSO287721665
4th rowCOSO36605385
5th rowCOSO60695381
ValueCountFrequency (%)
coso318721653 1
 
1.1%
coso97595385 1
 
1.1%
coso287721665 1
 
1.1%
coso36605385 1
 
1.1%
coso60695381 1
 
1.1%
coso34435546 1
 
1.1%
coso35205546 1
 
1.1%
coso31875763 1
 
1.1%
coso28815381 1
 
1.1%
coso36845546 1
 
1.1%
Other values (84) 84
89.4%
2025-07-15T00:43:25.452266image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
O 188
16.4%
5 147
12.8%
3 141
12.3%
8 96
8.4%
C 94
8.2%
S 94
8.2%
6 93
8.1%
7 89
7.8%
1 76
6.6%
4 49
 
4.3%
Other values (3) 80
7.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1147
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
O 188
16.4%
5 147
12.8%
3 141
12.3%
8 96
8.4%
C 94
8.2%
S 94
8.2%
6 93
8.1%
7 89
7.8%
1 76
6.6%
4 49
 
4.3%
Other values (3) 80
7.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1147
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
O 188
16.4%
5 147
12.8%
3 141
12.3%
8 96
8.4%
C 94
8.2%
S 94
8.2%
6 93
8.1%
7 89
7.8%
1 76
6.6%
4 49
 
4.3%
Other values (3) 80
7.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1147
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
O 188
16.4%
5 147
12.8%
3 141
12.3%
8 96
8.4%
C 94
8.2%
S 94
8.2%
6 93
8.1%
7 89
7.8%
1 76
6.6%
4 49
 
4.3%
Other values (3) 80
7.0%

PRIMARY_SITE
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size7.0 KiB
soft_tissue
93 
large_intestine
 
1

Length

Max length15
Median length11
Mean length11.042553
Min length11

Characters and Unicode

Total characters1038
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)1.1%

Sample

1st rowsoft_tissue
2nd rowsoft_tissue
3rd rowsoft_tissue
4th rowsoft_tissue
5th rowsoft_tissue

Common Values

ValueCountFrequency (%)
soft_tissue 93
98.9%
large_intestine 1
 
1.1%

Length

2025-07-15T00:43:25.549344image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:25.681216image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
soft_tissue 93
98.9%
large_intestine 1
 
1.1%

Most occurring characters

ValueCountFrequency (%)
s 280
27.0%
t 188
18.1%
e 96
 
9.2%
i 95
 
9.2%
_ 94
 
9.1%
o 93
 
9.0%
f 93
 
9.0%
u 93
 
9.0%
n 2
 
0.2%
l 1
 
0.1%
Other values (3) 3
 
0.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1038
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 280
27.0%
t 188
18.1%
e 96
 
9.2%
i 95
 
9.2%
_ 94
 
9.1%
o 93
 
9.0%
f 93
 
9.0%
u 93
 
9.0%
n 2
 
0.2%
l 1
 
0.1%
Other values (3) 3
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1038
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 280
27.0%
t 188
18.1%
e 96
 
9.2%
i 95
 
9.2%
_ 94
 
9.1%
o 93
 
9.0%
f 93
 
9.0%
u 93
 
9.0%
n 2
 
0.2%
l 1
 
0.1%
Other values (3) 3
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1038
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 280
27.0%
t 188
18.1%
e 96
 
9.2%
i 95
 
9.2%
_ 94
 
9.1%
o 93
 
9.0%
f 93
 
9.0%
u 93
 
9.0%
n 2
 
0.2%
l 1
 
0.1%
Other values (3) 3
 
0.3%

SITE_SUBTYPE_1
Categorical

High correlation  Imbalance 

Distinct2
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size9.2 KiB
fibrous_tissue_and_uncertain_origin
93 
rectum
 
1

Length

Max length35
Median length35
Mean length34.691489
Min length6

Characters and Unicode

Total characters3261
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)1.1%

Sample

1st rowfibrous_tissue_and_uncertain_origin
2nd rowfibrous_tissue_and_uncertain_origin
3rd rowfibrous_tissue_and_uncertain_origin
4th rowfibrous_tissue_and_uncertain_origin
5th rowfibrous_tissue_and_uncertain_origin

Common Values

ValueCountFrequency (%)
fibrous_tissue_and_uncertain_origin 93
98.9%
rectum 1
 
1.1%

Length

2025-07-15T00:43:25.742196image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:25.790254image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
fibrous_tissue_and_uncertain_origin 93
98.9%
rectum 1
 
1.1%

Most occurring characters

ValueCountFrequency (%)
i 465
14.3%
_ 372
11.4%
n 372
11.4%
u 280
8.6%
r 280
8.6%
s 279
8.6%
t 187
5.7%
e 187
5.7%
o 186
 
5.7%
a 186
 
5.7%
Other values (6) 467
14.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3261
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 465
14.3%
_ 372
11.4%
n 372
11.4%
u 280
8.6%
r 280
8.6%
s 279
8.6%
t 187
5.7%
e 187
5.7%
o 186
 
5.7%
a 186
 
5.7%
Other values (6) 467
14.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3261
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 465
14.3%
_ 372
11.4%
n 372
11.4%
u 280
8.6%
r 280
8.6%
s 279
8.6%
t 187
5.7%
e 187
5.7%
o 186
 
5.7%
a 186
 
5.7%
Other values (6) 467
14.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3261
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 465
14.3%
_ 372
11.4%
n 372
11.4%
u 280
8.6%
r 280
8.6%
s 279
8.6%
t 187
5.7%
e 187
5.7%
o 186
 
5.7%
a 186
 
5.7%
Other values (6) 467
14.3%

SITE_SUBTYPE_2
Categorical

Distinct28
Distinct (%)29.8%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
small_intestine
10 
stomach
large_intestine
retroperitoneum
peritoneum
 
5
Other values (23)
56 

Length

Max length43
Median length20
Mean length12.702128
Min length2

Characters and Unicode

Total characters1194
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)9.6%

Sample

1st rowretroperitoneum
2nd rowextra-gastrointestinal_site
3rd rowlarge_intestine
4th rowgastrointestinal_tract_(site_indeterminate)
5th rowmesocolon

Common Values

ValueCountFrequency (%)
small_intestine 10
 
10.6%
stomach 9
 
9.6%
large_intestine 8
 
8.5%
retroperitoneum 6
 
6.4%
peritoneum 5
 
5.3%
gastrointestinal_tract_(site_indeterminate) 5
 
5.3%
NS 5
 
5.3%
extra-gastrointestinal_site 4
 
4.3%
oesophagus 4
 
4.3%
omentum 4
 
4.3%
Other values (18) 34
36.2%

Length

2025-07-15T00:43:25.860715image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
small_intestine 10
 
10.6%
stomach 9
 
9.6%
large_intestine 8
 
8.5%
retroperitoneum 6
 
6.4%
peritoneum 5
 
5.3%
gastrointestinal_tract_(site_indeterminate 5
 
5.3%
ns 5
 
5.3%
extra-gastrointestinal_site 4
 
4.3%
oesophagus 4
 
4.3%
omentum 4
 
4.3%
Other values (18) 34
36.2%

Most occurring characters

ValueCountFrequency (%)
e 153
12.8%
t 147
12.3%
i 109
9.1%
a 101
8.5%
n 97
 
8.1%
s 88
 
7.4%
r 71
 
5.9%
o 63
 
5.3%
m 59
 
4.9%
l 59
 
4.9%
Other values (19) 247
20.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1194
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 153
12.8%
t 147
12.3%
i 109
9.1%
a 101
8.5%
n 97
 
8.1%
s 88
 
7.4%
r 71
 
5.9%
o 63
 
5.3%
m 59
 
4.9%
l 59
 
4.9%
Other values (19) 247
20.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1194
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 153
12.8%
t 147
12.3%
i 109
9.1%
a 101
8.5%
n 97
 
8.1%
s 88
 
7.4%
r 71
 
5.9%
o 63
 
5.3%
m 59
 
4.9%
l 59
 
4.9%
Other values (19) 247
20.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1194
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 153
12.8%
t 147
12.3%
i 109
9.1%
a 101
8.5%
n 97
 
8.1%
s 88
 
7.4%
r 71
 
5.9%
o 63
 
5.3%
m 59
 
4.9%
l 59
 
4.9%
Other values (19) 247
20.7%

SITE_SUBTYPE_3
Categorical

Constant 

Distinct1
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
NS
94 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters188
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNS
2nd rowNS
3rd rowNS
4th rowNS
5th rowNS

Common Values

ValueCountFrequency (%)
NS 94
100.0%

Length

2025-07-15T00:43:25.936549image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:25.977729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ns 94
100.0%

Most occurring characters

ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 188
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 188
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 188
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

PRIMARY_HISTOLOGY
Categorical

Constant 

Distinct1
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size8.8 KiB
gastrointestinal_stromal_tumour
94 

Length

Max length31
Median length31
Mean length31
Min length31

Characters and Unicode

Total characters2914
Distinct characters13
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgastrointestinal_stromal_tumour
2nd rowgastrointestinal_stromal_tumour
3rd rowgastrointestinal_stromal_tumour
4th rowgastrointestinal_stromal_tumour
5th rowgastrointestinal_stromal_tumour

Common Values

ValueCountFrequency (%)
gastrointestinal_stromal_tumour 94
100.0%

Length

2025-07-15T00:43:26.028015image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:26.070738image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
gastrointestinal_stromal_tumour 94
100.0%

Most occurring characters

ValueCountFrequency (%)
t 470
16.1%
s 282
9.7%
a 282
9.7%
r 282
9.7%
o 282
9.7%
m 188
 
6.5%
i 188
 
6.5%
n 188
 
6.5%
l 188
 
6.5%
u 188
 
6.5%
Other values (3) 376
12.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2914
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 470
16.1%
s 282
9.7%
a 282
9.7%
r 282
9.7%
o 282
9.7%
m 188
 
6.5%
i 188
 
6.5%
n 188
 
6.5%
l 188
 
6.5%
u 188
 
6.5%
Other values (3) 376
12.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2914
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 470
16.1%
s 282
9.7%
a 282
9.7%
r 282
9.7%
o 282
9.7%
m 188
 
6.5%
i 188
 
6.5%
n 188
 
6.5%
l 188
 
6.5%
u 188
 
6.5%
Other values (3) 376
12.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2914
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 470
16.1%
s 282
9.7%
a 282
9.7%
r 282
9.7%
o 282
9.7%
m 188
 
6.5%
i 188
 
6.5%
n 188
 
6.5%
l 188
 
6.5%
u 188
 
6.5%
Other values (3) 376
12.9%

HISTOLOGY_SUBTYPE_1
Categorical

High correlation 

Distinct9
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
NS
24 
spindle
21 
spindle_and_epithelioid
17 
epithelioid
14 
dedifferentiated
Other values (4)
10 

Length

Max length46
Median length23
Mean length11.968085
Min length2

Characters and Unicode

Total characters1125
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)2.1%

Sample

1st rowdedifferentiated
2nd rowspindle
3rd rowdedifferentiated
4th rowspindle
5th rowNS

Common Values

ValueCountFrequency (%)
NS 24
25.5%
spindle 21
22.3%
spindle_and_epithelioid 17
18.1%
epithelioid 14
14.9%
dedifferentiated 8
 
8.5%
transdifferentiated 6
 
6.4%
diffuse_interstitial_cell_of Cajal_hyperplasia 2
 
2.1%
spindle_and_epithelial_and_rhabdoid 1
 
1.1%
unusual_sub-type 1
 
1.1%

Length

2025-07-15T00:43:26.277744image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:26.353129image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ns 24
25.0%
spindle 21
21.9%
spindle_and_epithelioid 17
17.7%
epithelioid 14
14.6%
dedifferentiated 8
 
8.3%
transdifferentiated 6
 
6.2%
diffuse_interstitial_cell_of 2
 
2.1%
cajal_hyperplasia 2
 
2.1%
spindle_and_epithelial_and_rhabdoid 1
 
1.0%
unusual_sub-type 1
 
1.0%

Most occurring characters

ValueCountFrequency (%)
i 173
15.4%
e 162
14.4%
d 129
11.5%
l 83
7.4%
n 81
7.2%
p 76
 
6.8%
t 73
 
6.5%
s 53
 
4.7%
a 52
 
4.6%
_ 47
 
4.2%
Other values (14) 196
17.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1125
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 173
15.4%
e 162
14.4%
d 129
11.5%
l 83
7.4%
n 81
7.2%
p 76
 
6.8%
t 73
 
6.5%
s 53
 
4.7%
a 52
 
4.6%
_ 47
 
4.2%
Other values (14) 196
17.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1125
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 173
15.4%
e 162
14.4%
d 129
11.5%
l 83
7.4%
n 81
7.2%
p 76
 
6.8%
t 73
 
6.5%
s 53
 
4.7%
a 52
 
4.6%
_ 47
 
4.2%
Other values (14) 196
17.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1125
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 173
15.4%
e 162
14.4%
d 129
11.5%
l 83
7.4%
n 81
7.2%
p 76
 
6.8%
t 73
 
6.5%
s 53
 
4.7%
a 52
 
4.6%
_ 47
 
4.2%
Other values (14) 196
17.4%

HISTOLOGY_SUBTYPE_2
Categorical

High correlation  Imbalance 

Distinct10
Distinct (%)10.6%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
NS
82 
anaplastic_and_spindle
 
3
rhabdomyoblastic_and_spindle
 
2
anaplastic_and_epithelioid
 
1
rhabdomyoblastic_and_epithelioid
 
1
Other values (5)
 
5

Length

Max length51
Median length2
Mean length5.5
Min length2

Characters and Unicode

Total characters517
Distinct characters22
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)7.4%

Sample

1st rowNS
2nd rowNS
3rd rowanaplastic_and_epithelioid
4th rowNS
5th rowNS

Common Values

ValueCountFrequency (%)
NS 82
87.2%
anaplastic_and_spindle 3
 
3.2%
rhabdomyoblastic_and_spindle 2
 
2.1%
anaplastic_and_epithelioid 1
 
1.1%
rhabdomyoblastic_and_epithelioid 1
 
1.1%
plexiform 1
 
1.1%
anaplastic_and_spindle_and_epithelioid 1
 
1.1%
rhabdomyoblastic_and_anaplastic 1
 
1.1%
rhabdomyoblastic_and_epithelioid_and_spindle 1
 
1.1%
rhabdomyoblastic_and_chondrosarcomatous_and_spindle 1
 
1.1%

Length

2025-07-15T00:43:26.466285image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:26.543017image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ns 82
87.2%
anaplastic_and_spindle 3
 
3.2%
rhabdomyoblastic_and_spindle 2
 
2.1%
anaplastic_and_epithelioid 1
 
1.1%
rhabdomyoblastic_and_epithelioid 1
 
1.1%
plexiform 1
 
1.1%
anaplastic_and_spindle_and_epithelioid 1
 
1.1%
rhabdomyoblastic_and_anaplastic 1
 
1.1%
rhabdomyoblastic_and_epithelioid_and_spindle 1
 
1.1%
rhabdomyoblastic_and_chondrosarcomatous_and_spindle 1
 
1.1%

Most occurring characters

ValueCountFrequency (%)
N 82
15.9%
S 82
15.9%
a 46
 
8.9%
i 33
 
6.4%
d 33
 
6.4%
n 29
 
5.6%
_ 28
 
5.4%
l 25
 
4.8%
s 22
 
4.3%
o 21
 
4.1%
Other values (12) 116
22.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 517
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 82
15.9%
S 82
15.9%
a 46
 
8.9%
i 33
 
6.4%
d 33
 
6.4%
n 29
 
5.6%
_ 28
 
5.4%
l 25
 
4.8%
s 22
 
4.3%
o 21
 
4.1%
Other values (12) 116
22.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 517
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 82
15.9%
S 82
15.9%
a 46
 
8.9%
i 33
 
6.4%
d 33
 
6.4%
n 29
 
5.6%
_ 28
 
5.4%
l 25
 
4.8%
s 22
 
4.3%
o 21
 
4.1%
Other values (12) 116
22.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 517
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 82
15.9%
S 82
15.9%
a 46
 
8.9%
i 33
 
6.4%
d 33
 
6.4%
n 29
 
5.6%
_ 28
 
5.4%
l 25
 
4.8%
s 22
 
4.3%
o 21
 
4.1%
Other values (12) 116
22.4%

HISTOLOGY_SUBTYPE_3
Categorical

Constant 

Distinct1
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
NS
94 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters188
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNS
2nd rowNS
3rd rowNS
4th rowNS
5th rowNS

Common Values

ValueCountFrequency (%)
NS 94
100.0%

Length

2025-07-15T00:43:26.651259image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:26.692260image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
ns 94
100.0%

Most occurring characters

ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 188
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 188
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 188
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 94
50.0%
S 94
50.0%

NCI_CODE
Categorical

High correlation 

Distinct7
Distinct (%)7.4%
Missing0
Missing (%)0.0%
Memory size6.5 KiB
C3868
24 
C27792
20 
C27793
17 
C179932
14 
C3486
14 
Other values (2)

Length

Max length7
Median length6
Mean length5.712766
Min length5

Characters and Unicode

Total characters537
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowC179932
2nd rowC27792
3rd rowC179932
4th rowC27792
5th rowC3868

Common Values

ValueCountFrequency (%)
C3868 24
25.5%
C27792 20
21.3%
C27793 17
18.1%
C179932 14
14.9%
C3486 14
14.9%
C5811 3
 
3.2%
C27735 2
 
2.1%

Length

2025-07-15T00:43:26.751108image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:26.821596image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
c3868 24
25.5%
c27792 20
21.3%
c27793 17
18.1%
c179932 14
14.9%
c3486 14
14.9%
c5811 3
 
3.2%
c27735 2
 
2.1%

Most occurring characters

ValueCountFrequency (%)
C 94
17.5%
7 92
17.1%
2 73
13.6%
3 71
13.2%
8 65
12.1%
9 65
12.1%
6 38
7.1%
1 20
 
3.7%
4 14
 
2.6%
5 5
 
0.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 537
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 94
17.5%
7 92
17.1%
2 73
13.6%
3 71
13.2%
8 65
12.1%
9 65
12.1%
6 38
7.1%
1 20
 
3.7%
4 14
 
2.6%
5 5
 
0.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 537
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 94
17.5%
7 92
17.1%
2 73
13.6%
3 71
13.2%
8 65
12.1%
9 65
12.1%
6 38
7.1%
1 20
 
3.7%
4 14
 
2.6%
5 5
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 537
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 94
17.5%
7 92
17.1%
2 73
13.6%
3 71
13.2%
8 65
12.1%
9 65
12.1%
6 38
7.1%
1 20
 
3.7%
4 14
 
2.6%
5 5
 
0.9%

EFO
Categorical

High correlation  Imbalance  Missing 

Distinct2
Distinct (%)7.7%
Missing68
Missing (%)72.3%
Memory size7.5 KiB
http://purl.obolibrary.org/obo/MONDO_0011719
24 
http://www.ebi.ac.uk/efo/EFO_1000192
 
2

Length

Max length44
Median length44
Mean length43.384615
Min length36

Characters and Unicode

Total characters1128
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhttp://purl.obolibrary.org/obo/MONDO_0011719
2nd rowhttp://purl.obolibrary.org/obo/MONDO_0011719
3rd rowhttp://purl.obolibrary.org/obo/MONDO_0011719
4th rowhttp://purl.obolibrary.org/obo/MONDO_0011719
5th rowhttp://purl.obolibrary.org/obo/MONDO_0011719

Common Values

ValueCountFrequency (%)
http://purl.obolibrary.org/obo/MONDO_0011719 24
 
25.5%
http://www.ebi.ac.uk/efo/EFO_1000192 2
 
2.1%
(Missing) 68
72.3%

Length

2025-07-15T00:43:26.917789image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-15T00:43:26.970190image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
http://purl.obolibrary.org/obo/mondo_0011719 24
92.3%
http://www.ebi.ac.uk/efo/efo_1000192 2
 
7.7%

Most occurring characters

ValueCountFrequency (%)
o 122
 
10.8%
/ 104
 
9.2%
r 96
 
8.5%
1 76
 
6.7%
b 74
 
6.6%
. 54
 
4.8%
0 54
 
4.8%
t 52
 
4.6%
p 50
 
4.4%
O 50
 
4.4%
Other values (22) 396
35.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1128
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 122
 
10.8%
/ 104
 
9.2%
r 96
 
8.5%
1 76
 
6.7%
b 74
 
6.6%
. 54
 
4.8%
0 54
 
4.8%
t 52
 
4.6%
p 50
 
4.4%
O 50
 
4.4%
Other values (22) 396
35.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1128
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 122
 
10.8%
/ 104
 
9.2%
r 96
 
8.5%
1 76
 
6.7%
b 74
 
6.6%
. 54
 
4.8%
0 54
 
4.8%
t 52
 
4.6%
p 50
 
4.4%
O 50
 
4.4%
Other values (22) 396
35.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1128
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 122
 
10.8%
/ 104
 
9.2%
r 96
 
8.5%
1 76
 
6.7%
b 74
 
6.6%
. 54
 
4.8%
0 54
 
4.8%
t 52
 
4.6%
p 50
 
4.4%
O 50
 
4.4%
Other values (22) 396
35.1%

Correlations

2025-07-15T00:43:27.019451image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
EFOHISTOLOGY_SUBTYPE_1HISTOLOGY_SUBTYPE_2NCI_CODEPRIMARY_SITESITE_SUBTYPE_1SITE_SUBTYPE_2
EFO1.0000.9791.0000.7160.2520.2520.000
HISTOLOGY_SUBTYPE_10.9791.0000.3460.9000.0000.0000.000
HISTOLOGY_SUBTYPE_21.0000.3461.0000.2000.0000.0000.000
NCI_CODE0.7160.9000.2001.0000.6590.6590.000
PRIMARY_SITE0.2520.0000.0000.6591.0000.4860.000
SITE_SUBTYPE_10.2520.0000.0000.6590.4861.0000.000
SITE_SUBTYPE_20.0000.0000.0000.0000.0000.0001.000

Missing values

2025-07-15T00:43:24.857159image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-15T00:43:24.951561image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

COSMIC_PHENOTYPE_IDPRIMARY_SITESITE_SUBTYPE_1SITE_SUBTYPE_2SITE_SUBTYPE_3PRIMARY_HISTOLOGYHISTOLOGY_SUBTYPE_1HISTOLOGY_SUBTYPE_2HISTOLOGY_SUBTYPE_3NCI_CODEEFO
170COSO318721653soft_tissuefibrous_tissue_and_uncertain_originretroperitoneumNSgastrointestinal_stromal_tumourdedifferentiatedNSNSC179932NaN
317COSO97595385soft_tissuefibrous_tissue_and_uncertain_originextra-gastrointestinal_siteNSgastrointestinal_stromal_tumourspindleNSNSC27792NaN
322COSO287721665soft_tissuefibrous_tissue_and_uncertain_originlarge_intestineNSgastrointestinal_stromal_tumourdedifferentiatedanaplastic_and_epithelioidNSC179932NaN
365COSO36605385soft_tissuefibrous_tissue_and_uncertain_origingastrointestinal_tract_(site_indeterminate)NSgastrointestinal_stromal_tumourspindleNSNSC27792NaN
394COSO60695381soft_tissuefibrous_tissue_and_uncertain_originmesocolonNSgastrointestinal_stromal_tumourNSNSNSC3868http://purl.obolibrary.org/obo/MONDO_0011719
640COSO34435546soft_tissuefibrous_tissue_and_uncertain_originmediastinumNSgastrointestinal_stromal_tumourepithelioidNSNSC3486NaN
709COSO35205546soft_tissuefibrous_tissue_and_uncertain_originmesenteryNSgastrointestinal_stromal_tumourepithelioidNSNSC3486NaN
855COSO31875763soft_tissuefibrous_tissue_and_uncertain_originretroperitoneumNSgastrointestinal_stromal_tumourspindle_and_epithelioidNSNSC27793NaN
911COSO28815381soft_tissuefibrous_tissue_and_uncertain_originsmall_intestineNSgastrointestinal_stromal_tumourNSNSNSC3868http://purl.obolibrary.org/obo/MONDO_0011719
932COSO36845546soft_tissuefibrous_tissue_and_uncertain_originpelvic_cavityNSgastrointestinal_stromal_tumourepithelioidNSNSC3486NaN
COSMIC_PHENOTYPE_IDPRIMARY_SITESITE_SUBTYPE_1SITE_SUBTYPE_2SITE_SUBTYPE_3PRIMARY_HISTOLOGYHISTOLOGY_SUBTYPE_1HISTOLOGY_SUBTYPE_2HISTOLOGY_SUBTYPE_3NCI_CODEEFO
6680COSO36075546soft_tissuefibrous_tissue_and_uncertain_originstomachNSgastrointestinal_stromal_tumourepithelioidNSNSC3486NaN
6690COSO36605381soft_tissuefibrous_tissue_and_uncertain_origingastrointestinal_tract_(site_indeterminate)NSgastrointestinal_stromal_tumourNSNSNSC3868http://purl.obolibrary.org/obo/MONDO_0011719
6736COSO57285385soft_tissuefibrous_tissue_and_uncertain_originhipNSgastrointestinal_stromal_tumourspindleNSNSC27792NaN
6778COSO318721673soft_tissuefibrous_tissue_and_uncertain_originretroperitoneumNSgastrointestinal_stromal_tumourtransdifferentiatedrhabdomyoblastic_and_chondrosarcomatous_and_spindleNSC179932NaN
6822COSO37735546soft_tissuefibrous_tissue_and_uncertain_originabdomenNSgastrointestinal_stromal_tumourepithelioidNSNSC3486NaN
6839COSO37565381soft_tissuefibrous_tissue_and_uncertain_originliverNSgastrointestinal_stromal_tumourNSNSNSC3868http://purl.obolibrary.org/obo/MONDO_0011719
6941COSO31355546soft_tissuefibrous_tissue_and_uncertain_originNSNSgastrointestinal_stromal_tumourepithelioidNSNSC3486NaN
6957COSO34395385soft_tissuefibrous_tissue_and_uncertain_originabdominal_wallNSgastrointestinal_stromal_tumourspindleNSNSC27792NaN
7052COSO37735763soft_tissuefibrous_tissue_and_uncertain_originabdomenNSgastrointestinal_stromal_tumourspindle_and_epithelioidNSNSC27793NaN
7124COSO287721654soft_tissuefibrous_tissue_and_uncertain_originlarge_intestineNSgastrointestinal_stromal_tumourdedifferentiatedanaplastic_and_spindleNSC179932NaN