← Início

Overview

Brought to you by YData

Dataset statistics

Number of variables22
Number of observations6934
Missing cells58404
Missing cells (%)38.3%
Duplicate rows383
Duplicate rows (%)5.5%
Total size in memory5.6 MiB
Average record size in memory854.2 B

Variable types

Text4
Numeric6
Categorical11
DateTime1

Alerts

Dataset has 383 (5.5%) duplicate rowsDuplicates
Age at Which Sequencing was Reported (Years) is highly overall correlated with age_at_diagnosis and 2 other fieldsHigh correlation
Metastatic Site is highly overall correlated with sample_type and 2 other fieldsHigh correlation
age_at_diagnosis is highly overall correlated with Age at Which Sequencing was Reported (Years)High correlation
mitotic_rate is highly overall correlated with source and 1 other fieldsHigh correlation
os_status is highly overall correlated with sourceHigh correlation
primary_site is highly overall correlated with sample_type and 1 other fieldsHigh correlation
race is highly overall correlated with sourceHigh correlation
sample_coverage is highly overall correlated with source and 1 other fieldsHigh correlation
sample_type is highly overall correlated with Metastatic Site and 3 other fieldsHigh correlation
source is highly overall correlated with Age at Which Sequencing was Reported (Years) and 10 other fieldsHigh correlation
stage_at_diagnosis is highly overall correlated with sourceHigh correlation
treatment is highly overall correlated with sample_type and 1 other fieldsHigh correlation
tumor_grade is highly overall correlated with Age at Which Sequencing was Reported (Years) and 5 other fieldsHigh correlation
tumor_purity is highly overall correlated with tumor_gradeHigh correlation
tumor_size is highly overall correlated with source and 1 other fieldsHigh correlation
treatment_response is highly imbalanced (55.5%) Imbalance
primary_site is highly imbalanced (50.1%) Imbalance
os_status is highly imbalanced (51.2%) Imbalance
sample_id has 5236 (75.5%) missing values Missing
Age at Which Sequencing was Reported (Years) has 6064 (87.5%) missing values Missing
tumor_size has 6064 (87.5%) missing values Missing
mitotic_rate has 6111 (88.1%) missing values Missing
Metastatic Site has 6064 (87.5%) missing values Missing
tumor_purity has 6042 (87.1%) missing values Missing
sample_coverage has 6064 (87.5%) missing values Missing
os_months has 3687 (53.2%) missing values Missing
treatment_start has 6306 (90.9%) missing values Missing
os_status has 3636 (52.4%) missing values Missing
mutated_genes has 3130 (45.1%) missing values Missing

Reproduction

Analysis started2025-08-29 21:01:41.494210
Analysis finished2025-08-29 21:01:48.916942
Duration7.42 seconds
Software versionydata-profiling vv4.16.1
Download configurationconfig.json

Variables

sample_id
Text

Missing 

Distinct675
Distinct (%)39.8%
Missing5236
Missing (%)75.5%
Memory size268.3 KiB
2025-08-29T21:01:49.175168image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length17
Median length17
Mean length14.021201
Min length10

Characters and Unicode

Total characters23808
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique337 ?
Unique (%)19.8%

Sample

1st rowP-0000134-T02-IM3
2nd rowP-0000134-T02-IM3
3rd rowP-0000306-T01-IM3
4th rowP-0000501-T02-IM3
5th rowP-0000501-T02-IM3
ValueCountFrequency (%)
p-0001315-t01-im3 66
 
3.9%
p-0002672-t01-im3 32
 
1.9%
p-0013110-t01-im5 30
 
1.8%
p-0012178-t01-im5 24
 
1.4%
p-0005066-t01-im5 21
 
1.2%
p-0012564-t01-im5 20
 
1.2%
p-0013393-t01-im5 20
 
1.2%
p-0002409-t01-im3 18
 
1.1%
p-0005330-t02-im5 18
 
1.1%
p-0004937-t01-im5 18
 
1.1%
Other values (665) 1431
84.3%
2025-08-29T21:01:49.725362image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 4018
16.9%
- 2610
11.0%
1 2552
10.7%
S 1656
 
7.0%
5 1346
 
5.7%
3 1322
 
5.6%
2 1278
 
5.4%
4 964
 
4.0%
P 870
 
3.7%
T 870
 
3.7%
Other values (8) 6322
26.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 23808
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 4018
16.9%
- 2610
11.0%
1 2552
10.7%
S 1656
 
7.0%
5 1346
 
5.7%
3 1322
 
5.6%
2 1278
 
5.4%
4 964
 
4.0%
P 870
 
3.7%
T 870
 
3.7%
Other values (8) 6322
26.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 23808
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 4018
16.9%
- 2610
11.0%
1 2552
10.7%
S 1656
 
7.0%
5 1346
 
5.7%
3 1322
 
5.6%
2 1278
 
5.4%
4 964
 
4.0%
P 870
 
3.7%
T 870
 
3.7%
Other values (8) 6322
26.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 23808
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 4018
16.9%
- 2610
11.0%
1 2552
10.7%
S 1656
 
7.0%
5 1346
 
5.7%
3 1322
 
5.6%
2 1278
 
5.4%
4 964
 
4.0%
P 870
 
3.7%
T 870
 
3.7%
Other values (8) 6322
26.6%
Distinct2862
Distinct (%)41.3%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
2025-08-29T21:01:50.216502image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length36
Median length9
Mean length7.3442457
Min length3

Characters and Unicode

Total characters50925
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2558 ?
Unique (%)36.9%

Sample

1st rowP-0000134
2nd rowP-0000134
3rd rowP-0000306
4th rowP-0000501
5th rowP-0000501
ValueCountFrequency (%)
111316 2106
30.4%
627122 200
 
2.9%
429767 128
 
1.8%
814656 108
 
1.6%
636974 98
 
1.4%
949853 84
 
1.2%
p-0001315 72
 
1.0%
p-0002672 32
 
0.5%
p-0013110 30
 
0.4%
p-0012178 24
 
0.3%
Other values (2852) 4052
58.4%
2025-08-29T21:01:50.866744image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 13683
26.9%
6 5769
11.3%
3 5161
 
10.1%
0 5071
 
10.0%
2 4434
 
8.7%
9 3126
 
6.1%
4 3101
 
6.1%
5 2788
 
5.5%
7 2631
 
5.2%
8 2250
 
4.4%
Other values (8) 2911
 
5.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 50925
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 13683
26.9%
6 5769
11.3%
3 5161
 
10.1%
0 5071
 
10.0%
2 4434
 
8.7%
9 3126
 
6.1%
4 3101
 
6.1%
5 2788
 
5.5%
7 2631
 
5.2%
8 2250
 
4.4%
Other values (8) 2911
 
5.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 50925
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 13683
26.9%
6 5769
11.3%
3 5161
 
10.1%
0 5071
 
10.0%
2 4434
 
8.7%
9 3126
 
6.1%
4 3101
 
6.1%
5 2788
 
5.5%
7 2631
 
5.2%
8 2250
 
4.4%
Other values (8) 2911
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 50925
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 13683
26.9%
6 5769
11.3%
3 5161
 
10.1%
0 5071
 
10.0%
2 4434
 
8.7%
9 3126
 
6.1%
4 3101
 
6.1%
5 2788
 
5.5%
7 2631
 
5.2%
8 2250
 
4.4%
Other values (8) 2911
 
5.7%

age_at_diagnosis
Real number (ℝ)

High correlation 

Distinct73
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean58.860975
Minimum12
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size54.3 KiB
2025-08-29T21:01:51.096639image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum12
5-th percentile39
Q155
median58
Q366
95-th percentile80
Maximum90
Range78
Interquartile range (IQR)11

Descriptive statistics

Standard deviation11.750974
Coefficient of variation (CV)0.19963947
Kurtosis0.55650135
Mean58.860975
Median Absolute Deviation (MAD)5
Skewness-0.070998607
Sum408142
Variance138.08539
MonotonicityNot monotonic
2025-08-29T21:01:51.310013image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
58 2212
31.9%
39 319
 
4.6%
66 238
 
3.4%
57 219
 
3.2%
53 208
 
3.0%
59 163
 
2.4%
56 142
 
2.0%
60 137
 
2.0%
67 130
 
1.9%
64 130
 
1.9%
Other values (63) 3036
43.8%
ValueCountFrequency (%)
12 1
 
< 0.1%
14 1
 
< 0.1%
18 1
 
< 0.1%
19 6
0.1%
22 1
 
< 0.1%
23 3
 
< 0.1%
24 3
 
< 0.1%
25 6
0.1%
26 4
 
0.1%
27 11
0.2%
ValueCountFrequency (%)
90 42
0.6%
89 11
 
0.2%
88 20
0.3%
87 22
0.3%
86 25
0.4%
85 37
0.5%
84 43
0.6%
83 37
0.5%
82 28
0.4%
81 40
0.6%

Age at Which Sequencing was Reported (Years)
Real number (ℝ)

High correlation  Missing 

Distinct51
Distinct (%)5.9%
Missing6064
Missing (%)87.5%
Infinite0
Infinite (%)0.0%
Mean57.331034
Minimum28
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size54.3 KiB
2025-08-29T21:01:51.544899image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum28
5-th percentile36
Q149
median58
Q365
95-th percentile77
Maximum90
Range62
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.167732
Coefficient of variation (CV)0.21223639
Kurtosis-0.35045886
Mean57.331034
Median Absolute Deviation (MAD)8
Skewness-0.02569774
Sum49878
Variance148.0537
MonotonicityNot monotonic
2025-08-29T21:01:51.754808image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36 66
 
1.0%
60 54
 
0.8%
61 52
 
0.7%
53 52
 
0.7%
54 41
 
0.6%
65 39
 
0.6%
49 39
 
0.6%
64 37
 
0.5%
58 33
 
0.5%
50 31
 
0.4%
Other values (41) 426
 
6.1%
(Missing) 6064
87.5%
ValueCountFrequency (%)
28 3
 
< 0.1%
29 1
 
< 0.1%
31 6
 
0.1%
32 1
 
< 0.1%
33 4
 
0.1%
34 3
 
< 0.1%
36 66
1.0%
39 7
 
0.1%
42 14
 
0.2%
43 6
 
0.1%
ValueCountFrequency (%)
90 3
 
< 0.1%
88 3
 
< 0.1%
84 3
 
< 0.1%
83 5
 
0.1%
81 3
 
< 0.1%
80 3
 
< 0.1%
79 10
 
0.1%
78 6
 
0.1%
77 31
0.4%
76 2
 
< 0.1%

stage_at_diagnosis
Categorical

High correlation 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size392.0 KiB
Metastatic
3279 
Unknown
1925 
Localized
1356 
Regional
374 

Length

Max length10
Median length9
Mean length8.863715
Min length7

Characters and Unicode

Total characters61461
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMetastatic
2nd rowMetastatic
3rd rowLocalized
4th rowLocalized
5th rowLocalized

Common Values

ValueCountFrequency (%)
Metastatic 3279
47.3%
Unknown 1925
27.8%
Localized 1356
19.6%
Regional 374
 
5.4%

Length

2025-08-29T21:01:51.932729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:52.016407image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
metastatic 3279
47.3%
unknown 1925
27.8%
localized 1356
19.6%
regional 374
 
5.4%

Most occurring characters

ValueCountFrequency (%)
t 9837
16.0%
a 8288
13.5%
n 6149
10.0%
e 5009
8.1%
i 5009
8.1%
c 4635
7.5%
o 3655
 
5.9%
M 3279
 
5.3%
s 3279
 
5.3%
U 1925
 
3.1%
Other values (8) 10396
16.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 61461
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 9837
16.0%
a 8288
13.5%
n 6149
10.0%
e 5009
8.1%
i 5009
8.1%
c 4635
7.5%
o 3655
 
5.9%
M 3279
 
5.3%
s 3279
 
5.3%
U 1925
 
3.1%
Other values (8) 10396
16.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 61461
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 9837
16.0%
a 8288
13.5%
n 6149
10.0%
e 5009
8.1%
i 5009
8.1%
c 4635
7.5%
o 3655
 
5.9%
M 3279
 
5.3%
s 3279
 
5.3%
U 1925
 
3.1%
Other values (8) 10396
16.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 61461
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 9837
16.0%
a 8288
13.5%
n 6149
10.0%
e 5009
8.1%
i 5009
8.1%
c 4635
7.5%
o 3655
 
5.9%
M 3279
 
5.3%
s 3279
 
5.3%
U 1925
 
3.1%
Other values (8) 10396
16.9%

tumor_size
Real number (ℝ)

High correlation  Missing 

Distinct53
Distinct (%)6.1%
Missing6064
Missing (%)87.5%
Infinite0
Infinite (%)0.0%
Mean11.036897
Minimum1.4
Maximum26
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size54.3 KiB
2025-08-29T21:01:52.134036image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum1.4
5-th percentile2.8
Q16.9
median10
Q314.6
95-th percentile24
Maximum26
Range24.6
Interquartile range (IQR)7.7

Descriptive statistics

Standard deviation5.9330815
Coefficient of variation (CV)0.53756792
Kurtosis-0.19983541
Mean11.036897
Median Absolute Deviation (MAD)4
Skewness0.66844217
Sum9602.1
Variance35.201456
MonotonicityNot monotonic
2025-08-29T21:01:52.260637image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8 94
 
1.4%
15 51
 
0.7%
4 44
 
0.6%
10 42
 
0.6%
14 36
 
0.5%
12 35
 
0.5%
11 30
 
0.4%
6 30
 
0.4%
14.6 30
 
0.4%
20 25
 
0.4%
Other values (43) 453
 
6.5%
(Missing) 6064
87.5%
ValueCountFrequency (%)
1.4 4
 
0.1%
2.1 2
 
< 0.1%
2.3 12
0.2%
2.4 20
0.3%
2.8 12
0.2%
3 4
 
0.1%
3.4 2
 
< 0.1%
3.5 14
0.2%
3.6 4
 
0.1%
3.8 4
 
0.1%
ValueCountFrequency (%)
26 8
 
0.1%
25 24
0.3%
24 23
0.3%
21 13
0.2%
20 25
0.4%
19.5 10
 
0.1%
18.5 12
0.2%
18 6
 
0.1%
17.9 11
0.2%
17 15
0.2%

mitotic_rate
Real number (ℝ)

High correlation  Missing 

Distinct39
Distinct (%)4.7%
Missing6111
Missing (%)88.1%
Infinite0
Infinite (%)0.0%
Mean27.641555
Minimum0
Maximum112
Zeros44
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size54.3 KiB
2025-08-29T21:01:52.378721image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median20
Q348
95-th percentile90
Maximum112
Range112
Interquartile range (IQR)43

Descriptive statistics

Standard deviation27.142842
Coefficient of variation (CV)0.98195786
Kurtosis1.784581
Mean27.641555
Median Absolute Deviation (MAD)18
Skewness1.3771716
Sum22749
Variance736.73389
MonotonicityNot monotonic
2025-08-29T21:01:52.770933image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
48 72
 
1.0%
50 71
 
1.0%
5 54
 
0.8%
2 53
 
0.8%
15 44
 
0.6%
0 44
 
0.6%
10 42
 
0.6%
20 41
 
0.6%
12 32
 
0.5%
112 32
 
0.5%
Other values (29) 338
 
4.9%
(Missing) 6111
88.1%
ValueCountFrequency (%)
0 44
0.6%
1 27
0.4%
2 53
0.8%
3 12
 
0.2%
4 24
0.3%
5 54
0.8%
6 8
 
0.1%
7 17
 
0.2%
8 18
 
0.3%
10 42
0.6%
ValueCountFrequency (%)
112 32
0.5%
104 2
 
< 0.1%
90 16
 
0.2%
75 9
 
0.1%
55 20
 
0.3%
50 71
1.0%
48 72
1.0%
47 6
 
0.1%
46 4
 
0.1%
45 2
 
< 0.1%

treatment
Categorical

High correlation 

Distinct28
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size390.0 KiB
SURGERY
2429 
IMATINIB
2206 
RIPRETINIB
1053 
TREATMENT_NAIVE
286 
OTHER
269 
Other values (23)
691 

Length

Max length31
Median length21
Mean length8.5754254
Min length4

Characters and Unicode

Total characters59462
Distinct characters27
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)0.1%

Sample

1st rowOTHER
2nd rowOTHER
3rd rowOTHER
4th rowOTHER
5th rowOTHER

Common Values

ValueCountFrequency (%)
SURGERY 2429
35.0%
IMATINIB 2206
31.8%
RIPRETINIB 1053
15.2%
TREATMENT_NAIVE 286
 
4.1%
OTHER 269
 
3.9%
NO_CURRENT_THERAPY 116
 
1.7%
SUNITINIB 112
 
1.6%
UNKNOWN 94
 
1.4%
CLINICAL_TRIAL 84
 
1.2%
IMATINIB + SUNITINIB 75
 
1.1%
Other values (18) 210
 
3.0%

Length

2025-08-29T21:01:52.896431image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
surgery 2429
34.1%
imatinib 2302
32.3%
ripretinib 1053
14.8%
treatment_naive 286
 
4.0%
other 269
 
3.8%
sunitinib 188
 
2.6%
no_current_therapy 116
 
1.6%
unknown 113
 
1.6%
99
 
1.4%
clinical_trial 84
 
1.2%
Other values (14) 193
 
2.7%

Most occurring characters

ValueCountFrequency (%)
I 11424
19.2%
R 8151
13.7%
N 5147
8.7%
E 5048
8.5%
T 5016
8.4%
B 3708
 
6.2%
A 3333
 
5.6%
U 2867
 
4.8%
S 2680
 
4.5%
M 2622
 
4.4%
Other values (17) 9466
15.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 59462
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
I 11424
19.2%
R 8151
13.7%
N 5147
8.7%
E 5048
8.5%
T 5016
8.4%
B 3708
 
6.2%
A 3333
 
5.6%
U 2867
 
4.8%
S 2680
 
4.5%
M 2622
 
4.4%
Other values (17) 9466
15.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 59462
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
I 11424
19.2%
R 8151
13.7%
N 5147
8.7%
E 5048
8.5%
T 5016
8.4%
B 3708
 
6.2%
A 3333
 
5.6%
U 2867
 
4.8%
S 2680
 
4.5%
M 2622
 
4.4%
Other values (17) 9466
15.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 59462
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
I 11424
19.2%
R 8151
13.7%
N 5147
8.7%
E 5048
8.5%
T 5016
8.4%
B 3708
 
6.2%
A 3333
 
5.6%
U 2867
 
4.8%
S 2680
 
4.5%
M 2622
 
4.4%
Other values (17) 9466
15.9%

treatment_response
Categorical

Imbalance 

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size372.0 KiB
UNKNOWN
5433 
NR
842 
SD
 
226
PR
 
180
CR
 
162

Length

Max length7
Median length7
Mean length5.9176521
Min length2

Characters and Unicode

Total characters41033
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUNKNOWN
2nd rowUNKNOWN
3rd rowUNKNOWN
4th rowUNKNOWN
5th rowUNKNOWN

Common Values

ValueCountFrequency (%)
UNKNOWN 5433
78.4%
NR 842
 
12.1%
SD 226
 
3.3%
PR 180
 
2.6%
CR 162
 
2.3%
NE 91
 
1.3%

Length

2025-08-29T21:01:53.002125image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:53.086221image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
unknown 5433
78.4%
nr 842
 
12.1%
sd 226
 
3.3%
pr 180
 
2.6%
cr 162
 
2.3%
ne 91
 
1.3%

Most occurring characters

ValueCountFrequency (%)
N 17232
42.0%
U 5433
 
13.2%
K 5433
 
13.2%
O 5433
 
13.2%
W 5433
 
13.2%
R 1184
 
2.9%
S 226
 
0.6%
D 226
 
0.6%
P 180
 
0.4%
C 162
 
0.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 41033
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
N 17232
42.0%
U 5433
 
13.2%
K 5433
 
13.2%
O 5433
 
13.2%
W 5433
 
13.2%
R 1184
 
2.9%
S 226
 
0.6%
D 226
 
0.6%
P 180
 
0.4%
C 162
 
0.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 41033
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
N 17232
42.0%
U 5433
 
13.2%
K 5433
 
13.2%
O 5433
 
13.2%
W 5433
 
13.2%
R 1184
 
2.9%
S 226
 
0.6%
D 226
 
0.6%
P 180
 
0.4%
C 162
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 41033
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
N 17232
42.0%
U 5433
 
13.2%
K 5433
 
13.2%
O 5433
 
13.2%
W 5433
 
13.2%
R 1184
 
2.9%
S 226
 
0.6%
D 226
 
0.6%
P 180
 
0.4%
C 162
 
0.4%

primary_site
Categorical

High correlation  Imbalance 

Distinct24
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size397.9 KiB
Stomach
2621 
Liver
2114 
Small Intestine
1261 
Abdomen/Intraabdominal
373 
GI Tract (Indeterminate)
 
101
Other values (19)
464 

Length

Max length37
Median length30
Mean length9.7487742
Min length4

Characters and Unicode

Total characters67598
Distinct characters45
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)0.1%

Sample

1st rowStomach
2nd rowStomach
3rd rowStomach
4th rowStomach
5th rowStomach

Common Values

ValueCountFrequency (%)
Stomach 2621
37.8%
Liver 2114
30.5%
Small Intestine 1261
18.2%
Abdomen/Intraabdominal 373
 
5.4%
GI Tract (Indeterminate) 101
 
1.5%
Colon And Rectum (Excluding Appendix) 98
 
1.4%
Retroperitoneum 92
 
1.3%
Digestive Other 76
 
1.1%
Soft Tissue 60
 
0.9%
Colon/Rectum 43
 
0.6%
Other values (14) 95
 
1.4%

Length

2025-08-29T21:01:53.207169image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
stomach 2621
29.1%
liver 2114
23.5%
small 1261
14.0%
intestine 1261
14.0%
abdomen/intraabdominal 373
 
4.1%
and 129
 
1.4%
retroperitoneum 121
 
1.3%
appendix 103
 
1.1%
gi 101
 
1.1%
tract 101
 
1.1%
Other values (29) 822
 
9.1%

Most occurring characters

ValueCountFrequency (%)
t 6489
 
9.6%
e 6332
 
9.4%
a 5304
 
7.8%
m 5034
 
7.4%
n 4526
 
6.7%
i 4455
 
6.6%
o 4049
 
6.0%
S 3942
 
5.8%
l 3188
 
4.7%
r 3066
 
4.5%
Other values (35) 21213
31.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 67598
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 6489
 
9.6%
e 6332
 
9.4%
a 5304
 
7.8%
m 5034
 
7.4%
n 4526
 
6.7%
i 4455
 
6.6%
o 4049
 
6.0%
S 3942
 
5.8%
l 3188
 
4.7%
r 3066
 
4.5%
Other values (35) 21213
31.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 67598
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 6489
 
9.6%
e 6332
 
9.4%
a 5304
 
7.8%
m 5034
 
7.4%
n 4526
 
6.7%
i 4455
 
6.6%
o 4049
 
6.0%
S 3942
 
5.8%
l 3188
 
4.7%
r 3066
 
4.5%
Other values (35) 21213
31.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 67598
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 6489
 
9.6%
e 6332
 
9.4%
a 5304
 
7.8%
m 5034
 
7.4%
n 4526
 
6.7%
i 4455
 
6.6%
o 4049
 
6.0%
S 3942
 
5.8%
l 3188
 
4.7%
r 3066
 
4.5%
Other values (35) 21213
31.4%

sample_type
Categorical

High correlation 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size390.2 KiB
Metastasis
3097 
Unknown
2580 
Primary
1048 
Local Recurrence
 
209

Length

Max length16
Median length7
Mean length8.6111912
Min length7

Characters and Unicode

Total characters59710
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMetastasis
2nd rowMetastasis
3rd rowPrimary
4th rowMetastasis
5th rowMetastasis

Common Values

ValueCountFrequency (%)
Metastasis 3097
44.7%
Unknown 2580
37.2%
Primary 1048
 
15.1%
Local Recurrence 209
 
3.0%

Length

2025-08-29T21:01:53.318210image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:53.398787image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
metastasis 3097
43.4%
unknown 2580
36.1%
primary 1048
 
14.7%
local 209
 
2.9%
recurrence 209
 
2.9%

Most occurring characters

ValueCountFrequency (%)
s 9291
15.6%
n 7949
13.3%
a 7451
12.5%
t 6194
10.4%
i 4145
6.9%
e 3724
 
6.2%
M 3097
 
5.2%
o 2789
 
4.7%
U 2580
 
4.3%
k 2580
 
4.3%
Other values (11) 9910
16.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 59710
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
s 9291
15.6%
n 7949
13.3%
a 7451
12.5%
t 6194
10.4%
i 4145
6.9%
e 3724
 
6.2%
M 3097
 
5.2%
o 2789
 
4.7%
U 2580
 
4.3%
k 2580
 
4.3%
Other values (11) 9910
16.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 59710
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
s 9291
15.6%
n 7949
13.3%
a 7451
12.5%
t 6194
10.4%
i 4145
6.9%
e 3724
 
6.2%
M 3097
 
5.2%
o 2789
 
4.7%
U 2580
 
4.3%
k 2580
 
4.3%
Other values (11) 9910
16.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 59710
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
s 9291
15.6%
n 7949
13.3%
a 7451
12.5%
t 6194
10.4%
i 4145
6.9%
e 3724
 
6.2%
M 3097
 
5.2%
o 2789
 
4.7%
U 2580
 
4.3%
k 2580
 
4.3%
Other values (11) 9910
16.6%

race
Categorical

High correlation 

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size397.3 KiB
White
4601 
Unknown
995 
Black
465 
Other (American Indian/AK Native, Asian/Pacific Islander)
 
443
Black or African American
 
363
Other values (3)
 
67

Length

Max length57
Median length5
Mean length9.6592155
Min length5

Characters and Unicode

Total characters66977
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWhite
2nd rowWhite
3rd rowWhite
4th rowBlack or African American
5th rowBlack or African American

Common Values

ValueCountFrequency (%)
White 4601
66.4%
Unknown 995
 
14.3%
Black 465
 
6.7%
Other (American Indian/AK Native, Asian/Pacific Islander) 443
 
6.4%
Black or African American 363
 
5.2%
Asian 48
 
0.7%
Other 16
 
0.2%
Not Provided 3
 
< 0.1%

Length

2025-08-29T21:01:53.521198image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:53.616145image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
white 4601
44.9%
unknown 995
 
9.7%
black 828
 
8.1%
american 806
 
7.9%
other 459
 
4.5%
indian/ak 443
 
4.3%
native 443
 
4.3%
asian/pacific 443
 
4.3%
islander 443
 
4.3%
or 363
 
3.5%
Other values (4) 417
 
4.1%

Most occurring characters

ValueCountFrequency (%)
i 8036
12.0%
e 6755
 
10.1%
n 5974
 
8.9%
t 5506
 
8.2%
h 5060
 
7.6%
W 4601
 
6.9%
a 4260
 
6.4%
3307
 
4.9%
c 2883
 
4.3%
r 2437
 
3.6%
Other values (21) 18158
27.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 66977
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 8036
12.0%
e 6755
 
10.1%
n 5974
 
8.9%
t 5506
 
8.2%
h 5060
 
7.6%
W 4601
 
6.9%
a 4260
 
6.4%
3307
 
4.9%
c 2883
 
4.3%
r 2437
 
3.6%
Other values (21) 18158
27.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 66977
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 8036
12.0%
e 6755
 
10.1%
n 5974
 
8.9%
t 5506
 
8.2%
h 5060
 
7.6%
W 4601
 
6.9%
a 4260
 
6.4%
3307
 
4.9%
c 2883
 
4.3%
r 2437
 
3.6%
Other values (21) 18158
27.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 66977
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 8036
12.0%
e 6755
 
10.1%
n 5974
 
8.9%
t 5506
 
8.2%
h 5060
 
7.6%
W 4601
 
6.9%
a 4260
 
6.4%
3307
 
4.9%
c 2883
 
4.3%
r 2437
 
3.6%
Other values (21) 18158
27.1%

gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size363.2 KiB
Male
4805 
Female
2129 

Length

Max length6
Median length4
Mean length4.6140756
Min length4

Characters and Unicode

Total characters31994
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowFemale
3rd rowMale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male 4805
69.3%
Female 2129
30.7%

Length

2025-08-29T21:01:53.746663image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:53.820877image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 4805
69.3%
female 2129
30.7%

Most occurring characters

ValueCountFrequency (%)
e 9063
28.3%
a 6934
21.7%
l 6934
21.7%
M 4805
15.0%
F 2129
 
6.7%
m 2129
 
6.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 31994
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 9063
28.3%
a 6934
21.7%
l 6934
21.7%
M 4805
15.0%
F 2129
 
6.7%
m 2129
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 31994
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 9063
28.3%
a 6934
21.7%
l 6934
21.7%
M 4805
15.0%
F 2129
 
6.7%
m 2129
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 31994
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 9063
28.3%
a 6934
21.7%
l 6934
21.7%
M 4805
15.0%
F 2129
 
6.7%
m 2129
 
6.7%

Metastatic Site
Categorical

High correlation  Missing 

Distinct17
Distinct (%)2.0%
Missing6064
Missing (%)87.5%
Memory size381.7 KiB
Not Applicable
375 
Liver
282 
Pelvis
 
33
Mesentery
 
29
Spleen
 
26
Other values (12)
125 

Length

Max length14
Median length13
Mean length9.7954023
Min length4

Characters and Unicode

Total characters8522
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLiver
2nd rowLiver
3rd rowNot Applicable
4th rowLiver
5th rowLiver

Common Values

ValueCountFrequency (%)
Not Applicable 375
 
5.4%
Liver 282
 
4.1%
Pelvis 33
 
0.5%
Mesentery 29
 
0.4%
Spleen 26
 
0.4%
Small Bowel 21
 
0.3%
Peritoneum 18
 
0.3%
Abdomen 18
 
0.3%
Abdominal Wall 15
 
0.2%
Pleura 12
 
0.2%
Other values (7) 41
 
0.6%
(Missing) 6064
87.5%

Length

2025-08-29T21:01:53.909322image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
not 379
29.0%
applicable 375
28.7%
liver 282
21.6%
pelvis 36
 
2.8%
mesentery 29
 
2.2%
spleen 26
 
2.0%
small 21
 
1.6%
bowel 21
 
1.6%
abdominal 21
 
1.6%
peritoneum 18
 
1.4%
Other values (11) 98
 
7.5%

Most occurring characters

ValueCountFrequency (%)
l 973
11.4%
e 960
 
11.3%
p 776
 
9.1%
i 754
 
8.8%
a 477
 
5.6%
o 469
 
5.5%
436
 
5.1%
t 430
 
5.0%
b 418
 
4.9%
A 414
 
4.9%
Other values (20) 2415
28.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 8522
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
l 973
11.4%
e 960
 
11.3%
p 776
 
9.1%
i 754
 
8.8%
a 477
 
5.6%
o 469
 
5.5%
436
 
5.1%
t 430
 
5.0%
b 418
 
4.9%
A 414
 
4.9%
Other values (20) 2415
28.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 8522
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
l 973
11.4%
e 960
 
11.3%
p 776
 
9.1%
i 754
 
8.8%
a 477
 
5.6%
o 469
 
5.5%
436
 
5.1%
t 430
 
5.0%
b 418
 
4.9%
A 414
 
4.9%
Other values (20) 2415
28.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 8522
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
l 973
11.4%
e 960
 
11.3%
p 776
 
9.1%
i 754
 
8.8%
a 477
 
5.6%
o 469
 
5.5%
436
 
5.1%
t 430
 
5.0%
b 418
 
4.9%
A 414
 
4.9%
Other values (20) 2415
28.3%

tumor_purity
Real number (ℝ)

High correlation  Missing 

Distinct13
Distinct (%)1.5%
Missing6042
Missing (%)87.1%
Infinite0
Infinite (%)0.0%
Mean65.942825
Minimum10
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size54.3 KiB
2025-08-29T21:01:53.994340image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile30
Q150
median70
Q380
95-th percentile90
Maximum90
Range80
Interquartile range (IQR)30

Descriptive statistics

Standard deviation20.441262
Coefficient of variation (CV)0.30998463
Kurtosis-0.77062301
Mean65.942825
Median Absolute Deviation (MAD)10
Skewness-0.57372366
Sum58821
Variance417.84521
MonotonicityNot monotonic
2025-08-29T21:01:54.091637image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
80 206
 
3.0%
90 174
 
2.5%
60 157
 
2.3%
40 94
 
1.4%
70 92
 
1.3%
30 80
 
1.2%
50 54
 
0.8%
85 19
 
0.3%
15 6
 
0.1%
10 4
 
0.1%
Other values (3) 6
 
0.1%
(Missing) 6042
87.1%
ValueCountFrequency (%)
10 4
 
0.1%
15 6
 
0.1%
20 4
 
0.1%
30 80
1.2%
40 94
1.4%
50 54
 
0.8%
60 157
2.3%
63 1
 
< 0.1%
70 92
1.3%
73 1
 
< 0.1%
ValueCountFrequency (%)
90 174
2.5%
85 19
 
0.3%
80 206
3.0%
73 1
 
< 0.1%
70 92
1.3%
63 1
 
< 0.1%
60 157
2.3%
50 54
 
0.8%
40 94
1.4%
30 80
 
1.2%

sample_coverage
Real number (ℝ)

High correlation  Missing 

Distinct101
Distinct (%)11.6%
Missing6064
Missing (%)87.5%
Infinite0
Infinite (%)0.0%
Mean771.92299
Minimum106
Maximum1135
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size54.3 KiB
2025-08-29T21:01:54.210932image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

Minimum106
5-th percentile425
Q1657
median808
Q3903
95-th percentile1132
Maximum1135
Range1029
Interquartile range (IQR)246

Descriptive statistics

Standard deviation209.96053
Coefficient of variation (CV)0.27199673
Kurtosis0.17939174
Mean771.92299
Median Absolute Deviation (MAD)119
Skewness-0.38881445
Sum671573
Variance44083.422
MonotonicityNot monotonic
2025-08-29T21:01:54.342363image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1132 72
 
1.0%
821 36
 
0.5%
597 32
 
0.5%
920 24
 
0.3%
952 23
 
0.3%
780 21
 
0.3%
808 20
 
0.3%
470 18
 
0.3%
794 18
 
0.3%
682 18
 
0.3%
Other values (91) 588
 
8.5%
(Missing) 6064
87.5%
ValueCountFrequency (%)
106 4
0.1%
182 8
0.1%
205 3
 
< 0.1%
212 2
 
< 0.1%
294 2
 
< 0.1%
359 4
0.1%
372 2
 
< 0.1%
384 4
0.1%
391 1
 
< 0.1%
392 6
0.1%
ValueCountFrequency (%)
1135 2
 
< 0.1%
1132 72
1.0%
1107 4
 
0.1%
1085 5
 
0.1%
1079 3
 
< 0.1%
1072 2
 
< 0.1%
1050 4
 
0.1%
1031 8
 
0.1%
1023 6
 
0.1%
1022 2
 
< 0.1%

os_months
Text

Missing 

Distinct170
Distinct (%)5.2%
Missing3687
Missing (%)53.2%
Memory size284.8 KiB
2025-08-29T21:01:54.707300image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length7
Median length4
Mean length4.4490299
Min length4

Characters and Unicode

Total characters14446
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)0.6%

Sample

1st row11.079
2nd row11.079
3rd row92.351
4th row27.518
5th row27.518
ValueCountFrequency (%)
0000 126
 
3.9%
0001 89
 
2.7%
0003 85
 
2.6%
0004 80
 
2.5%
0006 78
 
2.4%
0002 77
 
2.4%
0009 76
 
2.3%
0005 76
 
2.3%
0010 74
 
2.3%
0019 73
 
2.2%
Other values (160) 2413
74.3%
2025-08-29T21:01:55.189317image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 6205
43.0%
1 1440
 
10.0%
2 1033
 
7.2%
4 838
 
5.8%
. 818
 
5.7%
7 767
 
5.3%
3 761
 
5.3%
9 664
 
4.6%
8 658
 
4.6%
5 619
 
4.3%
Other values (6) 643
 
4.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 14446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 6205
43.0%
1 1440
 
10.0%
2 1033
 
7.2%
4 838
 
5.8%
. 818
 
5.7%
7 767
 
5.3%
3 761
 
5.3%
9 664
 
4.6%
8 658
 
4.6%
5 619
 
4.3%
Other values (6) 643
 
4.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 14446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 6205
43.0%
1 1440
 
10.0%
2 1033
 
7.2%
4 838
 
5.8%
. 818
 
5.7%
7 767
 
5.3%
3 761
 
5.3%
9 664
 
4.6%
8 658
 
4.6%
5 619
 
4.3%
Other values (6) 643
 
4.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 14446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 6205
43.0%
1 1440
 
10.0%
2 1033
 
7.2%
4 838
 
5.8%
. 818
 
5.7%
7 767
 
5.3%
3 761
 
5.3%
9 664
 
4.6%
8 658
 
4.6%
5 619
 
4.3%
Other values (6) 643
 
4.5%

treatment_start
Date

Missing 

Distinct211
Distinct (%)33.6%
Missing6306
Missing (%)90.9%
Memory size54.3 KiB
Minimum1897-05-07 00:00:00
Maximum1914-03-26 00:00:00
Invalid dates0
Invalid dates (%)0.0%
2025-08-29T21:01:55.333069image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:55.490885image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

os_status
Categorical

High correlation  Imbalance  Missing 

Distinct3
Distinct (%)0.1%
Missing3636
Missing (%)52.4%
Memory size382.8 KiB
DECEASED
2760 
ALIVE
404 
DECEASED_NON_CANCER
 
134

Length

Max length19
Median length8
Mean length8.0794421
Min length5

Characters and Unicode

Total characters26646
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDECEASED
2nd rowDECEASED
3rd rowALIVE
4th rowALIVE
5th rowALIVE

Common Values

ValueCountFrequency (%)
DECEASED 2760
39.8%
ALIVE 404
 
5.8%
DECEASED_NON_CANCER 134
 
1.9%
(Missing) 3636
52.4%

Length

2025-08-29T21:01:55.632983image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:55.708297image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
deceased 2760
83.7%
alive 404
 
12.2%
deceased_non_cancer 134
 
4.1%

Most occurring characters

ValueCountFrequency (%)
E 9220
34.6%
D 5788
21.7%
A 3432
 
12.9%
C 3162
 
11.9%
S 2894
 
10.9%
L 404
 
1.5%
I 404
 
1.5%
V 404
 
1.5%
N 402
 
1.5%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 26646
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
E 9220
34.6%
D 5788
21.7%
A 3432
 
12.9%
C 3162
 
11.9%
S 2894
 
10.9%
L 404
 
1.5%
I 404
 
1.5%
V 404
 
1.5%
N 402
 
1.5%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 26646
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
E 9220
34.6%
D 5788
21.7%
A 3432
 
12.9%
C 3162
 
11.9%
S 2894
 
10.9%
L 404
 
1.5%
I 404
 
1.5%
V 404
 
1.5%
N 402
 
1.5%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 26646
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
E 9220
34.6%
D 5788
21.7%
A 3432
 
12.9%
C 3162
 
11.9%
S 2894
 
10.9%
L 404
 
1.5%
I 404
 
1.5%
V 404
 
1.5%
N 402
 
1.5%
_ 268
 
1.0%
Other values (2) 268
 
1.0%

mutated_genes
Text

Missing 

Distinct88
Distinct (%)2.3%
Missing3130
Missing (%)45.1%
Memory size292.2 KiB
2025-08-29T21:01:55.911771image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

Max length7
Median length3
Mean length3.2936383
Min length2

Characters and Unicode

Total characters12529
Distinct characters32
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)0.4%

Sample

1st rowTP53
2nd rowRB1
3rd rowKIT
4th rowMTOR
5th rowSDHB
ValueCountFrequency (%)
kit 3123
82.1%
kmt2c 170
 
4.5%
pdgfra 52
 
1.4%
rb1 30
 
0.8%
nf1 29
 
0.8%
max 28
 
0.7%
braf 23
 
0.6%
setd2 17
 
0.4%
tp53 17
 
0.4%
mga 16
 
0.4%
Other values (78) 299
 
7.9%
2025-08-29T21:01:56.252806image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 3434
27.4%
K 3360
26.8%
I 3166
25.3%
2 261
 
2.1%
M 257
 
2.1%
C 246
 
2.0%
A 218
 
1.7%
R 210
 
1.7%
P 195
 
1.6%
F 149
 
1.2%
Other values (22) 1033
 
8.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12529
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
T 3434
27.4%
K 3360
26.8%
I 3166
25.3%
2 261
 
2.1%
M 257
 
2.1%
C 246
 
2.0%
A 218
 
1.7%
R 210
 
1.7%
P 195
 
1.6%
F 149
 
1.2%
Other values (22) 1033
 
8.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12529
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
T 3434
27.4%
K 3360
26.8%
I 3166
25.3%
2 261
 
2.1%
M 257
 
2.1%
C 246
 
2.0%
A 218
 
1.7%
R 210
 
1.7%
P 195
 
1.6%
F 149
 
1.2%
Other values (22) 1033
 
8.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12529
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
T 3434
27.4%
K 3360
26.8%
I 3166
25.3%
2 261
 
2.1%
M 257
 
2.1%
C 246
 
2.0%
A 218
 
1.7%
R 210
 
1.7%
P 195
 
1.6%
F 149
 
1.2%
Other values (22) 1033
 
8.2%

source
Categorical

High correlation 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size365.7 KiB
PDMR
2733 
SEER
2429 
CBioPortal
870 
COSMIC
828 
GDC
 
74

Length

Max length10
Median length4
Mean length4.9809634
Min length3

Characters and Unicode

Total characters34538
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCBioPortal
2nd rowCBioPortal
3rd rowCBioPortal
4th rowCBioPortal
5th rowCBioPortal

Common Values

ValueCountFrequency (%)
PDMR 2733
39.4%
SEER 2429
35.0%
CBioPortal 870
 
12.5%
COSMIC 828
 
11.9%
GDC 74
 
1.1%

Length

2025-08-29T21:01:56.373897image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:56.467317image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
pdmr 2733
39.4%
seer 2429
35.0%
cbioportal 870
 
12.5%
cosmic 828
 
11.9%
gdc 74
 
1.1%

Most occurring characters

ValueCountFrequency (%)
R 5162
14.9%
E 4858
14.1%
P 3603
10.4%
M 3561
10.3%
S 3257
9.4%
D 2807
8.1%
C 2600
7.5%
o 1740
 
5.0%
B 870
 
2.5%
i 870
 
2.5%
Other values (7) 5210
15.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 34538
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
R 5162
14.9%
E 4858
14.1%
P 3603
10.4%
M 3561
10.3%
S 3257
9.4%
D 2807
8.1%
C 2600
7.5%
o 1740
 
5.0%
B 870
 
2.5%
i 870
 
2.5%
Other values (7) 5210
15.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 34538
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
R 5162
14.9%
E 4858
14.1%
P 3603
10.4%
M 3561
10.3%
S 3257
9.4%
D 2807
8.1%
C 2600
7.5%
o 1740
 
5.0%
B 870
 
2.5%
i 870
 
2.5%
Other values (7) 5210
15.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 34538
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
R 5162
14.9%
E 4858
14.1%
P 3603
10.4%
M 3561
10.3%
S 3257
9.4%
D 2807
8.1%
C 2600
7.5%
o 1740
 
5.0%
B 870
 
2.5%
i 870
 
2.5%
Other values (7) 5210
15.1%

tumor_grade
Categorical

High correlation 

Distinct4
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size395.0 KiB
High grade
3872 
Unknown
2436 
Intermediate grade
 
349
Low grade
 
277

Length

Max length18
Median length10
Mean length9.3087684
Min length7

Characters and Unicode

Total characters64547
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
High grade 3872
55.8%
Unknown 2436
35.1%
Intermediate grade 349
 
5.0%
Low grade 277
 
4.0%

Length

2025-08-29T21:01:56.571987image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-08-29T21:01:56.662261image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
grade 4498
39.3%
high 3872
33.9%
unknown 2436
21.3%
intermediate 349
 
3.1%
low 277
 
2.4%

Most occurring characters

ValueCountFrequency (%)
g 8370
13.0%
n 7657
11.9%
e 5545
8.6%
r 4847
7.5%
d 4847
7.5%
a 4847
7.5%
4498
 
7.0%
i 4221
 
6.5%
H 3872
 
6.0%
h 3872
 
6.0%
Other values (8) 11971
18.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 64547
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
g 8370
13.0%
n 7657
11.9%
e 5545
8.6%
r 4847
7.5%
d 4847
7.5%
a 4847
7.5%
4498
 
7.0%
i 4221
 
6.5%
H 3872
 
6.0%
h 3872
 
6.0%
Other values (8) 11971
18.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 64547
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
g 8370
13.0%
n 7657
11.9%
e 5545
8.6%
r 4847
7.5%
d 4847
7.5%
a 4847
7.5%
4498
 
7.0%
i 4221
 
6.5%
H 3872
 
6.0%
h 3872
 
6.0%
Other values (8) 11971
18.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 64547
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
g 8370
13.0%
n 7657
11.9%
e 5545
8.6%
r 4847
7.5%
d 4847
7.5%
a 4847
7.5%
4498
 
7.0%
i 4221
 
6.5%
H 3872
 
6.0%
h 3872
 
6.0%
Other values (8) 11971
18.5%

Interactions

2025-08-29T21:01:47.447400image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:43.510726image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.188078image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.880732image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.523527image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.197482image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:47.576313image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:43.631598image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.302244image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.987877image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.627184image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.319363image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:47.688862image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:43.740992image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.418220image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.119585image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.743119image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.435425image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:47.786370image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:43.846868image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.528937image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.214564image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.848703image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.542286image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:47.889446image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:43.952059image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.648704image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.313097image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.953324image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.648652image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:48.005234image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.084241image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:44.767347image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:45.424824image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.068279image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
2025-08-29T21:01:46.769210image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

2025-08-29T21:01:56.756906image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Age at Which Sequencing was Reported (Years)Metastatic Siteage_at_diagnosisgendermitotic_rateos_statusprimary_siteracesample_coveragesample_typesourcestage_at_diagnosistreatmenttreatment_responsetumor_gradetumor_puritytumor_size
Age at Which Sequencing was Reported (Years)1.0000.3640.9010.2780.0760.4450.3590.232-0.1760.3841.0000.3480.2280.2461.000-0.0990.243
Metastatic Site0.3641.0000.3170.4800.3780.3780.4470.3860.3430.8651.0000.4730.1710.2441.0000.3620.419
age_at_diagnosis0.9010.3171.0000.3070.2060.2910.2810.181-0.2120.4260.4190.3320.3700.1550.289-0.1260.185
gender0.2780.4800.3071.0000.3750.1360.4500.3710.3420.4070.4090.3530.4370.1140.3020.1900.341
mitotic_rate0.0760.3780.2060.3751.0000.3830.3020.3030.0530.3381.0000.4640.1730.2201.0000.187-0.089
os_status0.4450.3780.2910.1360.3831.0000.2010.2100.3860.4550.6290.2150.4770.2730.3700.3100.352
primary_site0.3590.4470.2810.4500.3020.2011.0000.3100.3340.5250.5540.4830.2820.3240.4020.1650.345
race0.2320.3860.1810.3710.3030.2100.3101.0000.2930.4530.5590.3740.3310.3860.3970.3160.403
sample_coverage-0.1760.343-0.2120.3420.0530.3860.3340.2931.0000.4201.0000.3330.1920.2591.0000.2480.003
sample_type0.3840.8650.4260.4070.3380.4550.5250.4530.4201.0000.6290.4290.6930.3590.3510.2990.292
source1.0001.0000.4190.4091.0000.6290.5540.5591.0000.6291.0000.5530.8230.4980.4870.0851.000
stage_at_diagnosis0.3480.4730.3320.3530.4640.2150.4830.3740.3330.4290.5531.0000.4570.3020.3300.2030.347
treatment0.2280.1710.3700.4370.1730.4770.2820.3310.1920.6930.8230.4571.0000.4120.4370.1310.159
treatment_response0.2460.2440.1550.1140.2200.2730.3240.3860.2590.3590.4980.3020.4121.0000.3840.2280.189
tumor_grade1.0001.0000.2890.3021.0000.3700.4020.3971.0000.3510.4870.3300.4370.3841.0001.0001.000
tumor_purity-0.0990.362-0.1260.1900.1870.3100.1650.3160.2480.2990.0850.2030.1310.2281.0001.0000.048
tumor_size0.2430.4190.1850.341-0.0890.3520.3450.4030.0030.2921.0000.3470.1590.1891.0000.0481.000

Missing values

2025-08-29T21:01:48.235184image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.
2025-08-29T21:01:48.446764image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-08-29T21:01:48.701495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

sample_idpatient_idage_at_diagnosisAge at Which Sequencing was Reported (Years)stage_at_diagnosistumor_sizemitotic_ratetreatmenttreatment_responseprimary_sitesample_typeracegenderMetastatic Sitetumor_puritysample_coverageos_monthstreatment_startos_statusmutated_genessourcetumor_grade
0P-0000134-T02-IM3P-000013468.070.0Metastatic13.650.0OTHERUNKNOWNStomachMetastasisWhiteFemaleLiver90.0661.011.079NaNDECEASEDTP53CBioPortalUnknown
1P-0000134-T02-IM3P-000013468.070.0Metastatic13.650.0OTHERUNKNOWNStomachMetastasisWhiteFemaleLiver90.0661.011.079NaNDECEASEDRB1CBioPortalUnknown
2P-0000306-T01-IM3P-000030648.057.0Localized13.05.0OTHERUNKNOWNStomachPrimaryWhiteMaleNot Applicable90.0212.092.351NaNALIVEKITCBioPortalUnknown
3P-0000501-T02-IM3P-000050137.042.0Localized10.22.0OTHERUNKNOWNStomachMetastasisBlack or African AmericanMaleLiver60.0811.027.518NaNALIVEMTORCBioPortalUnknown
4P-0000501-T02-IM3P-000050137.042.0Localized10.22.0OTHERUNKNOWNStomachMetastasisBlack or African AmericanMaleLiver60.0811.027.518NaNALIVESDHBCBioPortalUnknown
5P-0001315-T02-IM5P-000131531.039.0Metastatic8.048.0OTHERUNKNOWNSmall IntestineMetastasisWhiteMaleSkin90.01023.097.348NaNALIVEBRAFCBioPortalUnknown
6P-0001315-T02-IM5P-000131531.039.0Metastatic8.048.0OTHERUNKNOWNSmall IntestineMetastasisWhiteMaleSkin90.01023.097.348NaNALIVEEP300CBioPortalUnknown
7P-0001315-T02-IM5P-000131531.039.0Metastatic8.048.0OTHERUNKNOWNSmall IntestineMetastasisWhiteMaleSkin90.01023.097.348NaNALIVERB1CBioPortalUnknown
8P-0001315-T02-IM5P-000131531.039.0Metastatic8.048.0OTHERUNKNOWNSmall IntestineMetastasisWhiteMaleSkin90.01023.097.348NaNALIVETSC2CBioPortalUnknown
9P-0001315-T02-IM5P-000131531.039.0Metastatic8.048.0OTHERUNKNOWNSmall IntestineMetastasisWhiteMaleSkin90.01023.097.348NaNALIVENF1CBioPortalUnknown
sample_idpatient_idage_at_diagnosisAge at Which Sequencing was Reported (Years)stage_at_diagnosistumor_sizemitotic_ratetreatmenttreatment_responseprimary_sitesample_typeracegenderMetastatic Sitetumor_puritysample_coverageos_monthstreatment_startos_statusmutated_genessourcetumor_grade
6924COSS1477588135115964.0NaNUnknownNaNNaNIMATINIBUNKNOWNGI Tract (Indeterminate)UnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6925COSS1477581135115254.0NaNUnknownNaNNaNIMATINIBUNKNOWNGI Tract (Indeterminate)UnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6926COSS1477586135115734.0NaNUnknownNaNNaNIMATINIBUNKNOWNGI Tract (Indeterminate)UnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6927COSS1477589135116051.0NaNUnknownNaNNaNIMATINIBUNKNOWNGI Tract (Indeterminate)UnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6928COSS1477584135115555.0NaNUnknownNaNNaNIMATINIBUNKNOWNGI Tract (Indeterminate)UnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6929COSS90921280892740.0NaNUnknownNaNNaNIMATINIBPRAbdomen/IntraabdominalMetastasisUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6930COSS90921380892740.0NaNUnknownNaNNaNIMATINIBNRAbdomen/IntraabdominalLocal RecurrenceUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6931COSS90921380892740.0NaNUnknownNaNNaNIMATINIBNRAbdomen/IntraabdominalLocal RecurrenceUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6932COSS2479667219191758.0NaNUnknownNaNNaNIMATINIBNRColon/RectumUnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown
6933COSS2479666219191758.0NaNUnknownNaNNaNIMATINIBNRColon/RectumUnknownUnknownMaleNaNNaNNaNNaNNaNNaNKITCOSMICUnknown

Duplicate rows

Most frequently occurring

sample_idpatient_idage_at_diagnosisAge at Which Sequencing was Reported (Years)stage_at_diagnosistumor_sizemitotic_ratetreatmenttreatment_responseprimary_sitesample_typeracegenderMetastatic Sitetumor_puritysample_coverageos_monthstreatment_startos_statusmutated_genessourcetumor_grade# duplicates
369NaN11131658.0NaNMetastaticNaNNaNIMATINIBUNKNOWNLiverMetastasisWhiteMaleNaNNaNNaNNaNNaNNaNKITPDMRHigh grade972
371NaN11131658.0NaNMetastaticNaNNaNRIPRETINIBUNKNOWNLiverMetastasisWhiteMaleNaNNaNNaNNaNNaNNaNKITPDMRHigh grade972
376NaN62712239.0NaNUnknownNaNNaNTREATMENT_NAIVEUNKNOWNStomachPrimaryWhiteMaleNaNNaNNaNNaNNaNNaNNaNPDMRUnknown200
382NaN94985339.0NaNMetastaticNaNNaNTREATMENT_NAIVEUNKNOWNStomachPrimaryWhiteMaleNaNNaNNaNNaNNaNNaNNaNPDMRUnknown84
370NaN11131658.0NaNMetastaticNaNNaNIMATINIBUNKNOWNLiverMetastasisWhiteMaleNaNNaNNaNNaNNaNNaNKMT2CPDMRHigh grade81
372NaN11131658.0NaNMetastaticNaNNaNRIPRETINIBUNKNOWNLiverMetastasisWhiteMaleNaNNaNNaNNaNNaNNaNKMT2CPDMRHigh grade81
374NaN42976753.0NaNMetastaticNaNNaNIMATINIBSDAbdomen/IntraabdominalPrimaryBlack or African AmericanFemaleNaNNaNNaNNaNNaNNaNNaNPDMRUnknown64
375NaN42976753.0NaNMetastaticNaNNaNNO_CURRENT_THERAPYUNKNOWNAbdomen/IntraabdominalPrimaryBlack or African AmericanFemaleNaNNaNNaNNaNNaNNaNNaNPDMRUnknown64
377NaN63697466.0NaNMetastaticNaNNaNIMATINIBSDAbdomen/IntraabdominalPrimaryWhiteMaleNaNNaNNaNNaNNaNNaNNaNPDMRHigh grade49
378NaN63697466.0NaNMetastaticNaNNaNNO_CURRENT_THERAPYUNKNOWNAbdomen/IntraabdominalPrimaryWhiteMaleNaNNaNNaNNaNNaNNaNNaNPDMRHigh grade49