Overview

Dataset statistics

Number of variables20
Number of observations10000
Missing cells27401
Missing cells (%)13.7%
Duplicate rows199
Duplicate rows (%)2.0%
Total size in memory1.6 MiB
Average record size in memory173.0 B

Variable types

Text12
Categorical4
Numeric4

Alerts

Dataset has 199 (2.0%) duplicate rowsDuplicates
건축면적 is highly overall correlated with 대지면적 and 2 other fieldsHigh correlation
대지면적 is highly overall correlated with 건축면적High correlation
연면적 is highly overall correlated with 건축면적 and 1 other fieldsHigh correlation
지상층수 is highly overall correlated with 건축면적 and 1 other fieldsHigh correlation
지하층수 is highly imbalanced (56.4%)Imbalance
호명칭 is highly imbalanced (55.5%)Imbalance
건물명 has 7357 (73.6%) missing valuesMissing
동명칭 has 7183 (71.8%) missing valuesMissing
대지면적 has 1325 (13.2%) missing valuesMissing
허가일 has 3953 (39.5%) missing valuesMissing
착공일 has 5021 (50.2%) missing valuesMissing
사용승인일 has 2344 (23.4%) missing valuesMissing
건축면적 is highly skewed (γ1 = 36.60130918)Skewed
대지면적 is highly skewed (γ1 = 27.30527559)Skewed
연면적 is highly skewed (γ1 = 32.56567845)Skewed
대지면적 has 4153 (41.5%) zerosZeros

Reproduction

Analysis started2024-01-09 22:46:33.717394
Analysis finished2024-01-09 22:46:37.975608
Duration4.26 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct6364
Distinct (%)63.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T07:46:38.105829image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length21
Median length21
Mean length21
Min length21

Characters and Unicode

Total characters210000
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4597 ?
Unique (%)46.0%

Sample

1st row4427011100-1-01810004
2nd row4427010500-1-06350000
3rd row4427025333-1-00010245
4th row4427010100-1-02440010
5th row4427010700-3-18080000
ValueCountFrequency (%)
4427010100-3-10610000 65
 
0.7%
4427010100-3-14510000 53
 
0.5%
4427025321-3-05100000 52
 
0.5%
4427010400-3-12510000 52
 
0.5%
4427025328-3-09870000 51
 
0.5%
4427010700-3-18080000 49
 
0.5%
4427010200-3-11170000 43
 
0.4%
4427010400-3-12570000 40
 
0.4%
4427010400-3-12480000 39
 
0.4%
4427010400-3-12550000 36
 
0.4%
Other values (6354) 9520
95.2%
2024-01-10T07:46:38.407276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 70428
33.5%
1 25202
 
12.0%
4 24606
 
11.7%
2 24521
 
11.7%
- 20000
 
9.5%
7 13478
 
6.4%
3 11902
 
5.7%
5 9246
 
4.4%
6 4404
 
2.1%
8 3317
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 190000
90.5%
Dash Punctuation 20000
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 70428
37.1%
1 25202
 
13.3%
4 24606
 
13.0%
2 24521
 
12.9%
7 13478
 
7.1%
3 11902
 
6.3%
5 9246
 
4.9%
6 4404
 
2.3%
8 3317
 
1.7%
9 2896
 
1.5%
Dash Punctuation
ValueCountFrequency (%)
- 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 210000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 70428
33.5%
1 25202
 
12.0%
4 24606
 
11.7%
2 24521
 
11.7%
- 20000
 
9.5%
7 13478
 
6.4%
3 11902
 
5.7%
5 9246
 
4.4%
6 4404
 
2.1%
8 3317
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 210000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 70428
33.5%
1 25202
 
12.0%
4 24606
 
11.7%
2 24521
 
11.7%
- 20000
 
9.5%
7 13478
 
6.4%
3 11902
 
5.7%
5 9246
 
4.4%
6 4404
 
2.1%
8 3317
 
1.6%
Distinct6360
Distinct (%)63.6%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T07:46:38.777360image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length50
Median length35
Mean length19.6609
Min length12

Characters and Unicode

Total characters196609
Distinct characters94
Distinct categories8 ?
Distinct scripts3 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4591 ?
Unique (%)45.9%

Sample

1st row충청남도 당진시 구룡동 181-4
2nd row충청남도 당진시 시곡동 635
3rd row충청남도 당진시 송악읍 복운리 1-245
4th row충청남도 당진시 읍내동 244-10
5th row충청남도 당진시 대덕동 1808
ValueCountFrequency (%)
충청남도 10000
22.0%
당진시 10000
22.0%
송악읍 2861
 
6.3%
합덕읍 2298
 
5.1%
읍내동 2222
 
4.9%
운산리 1011
 
2.2%
채운동 668
 
1.5%
복운리 572
 
1.3%
원당동 492
 
1.1%
대덕동 447
 
1.0%
Other values (4906) 14858
32.7%
2024-01-10T07:46:39.238559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
35429
18.0%
10566
 
5.4%
10493
 
5.3%
10391
 
5.3%
10275
 
5.2%
10184
 
5.2%
10000
 
5.1%
10000
 
5.1%
1 8342
 
4.2%
7381
 
3.8%
Other values (84) 73548
37.4%

Most occurring categories

ValueCountFrequency (%)
Other Letter 116272
59.1%
Decimal Number 38956
 
19.8%
Space Separator 35429
 
18.0%
Dash Punctuation 5937
 
3.0%
Other Punctuation 9
 
< 0.1%
Uppercase Letter 2
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
10566
 
9.1%
10493
 
9.0%
10391
 
8.9%
10275
 
8.8%
10184
 
8.8%
10000
 
8.6%
10000
 
8.6%
7381
 
6.3%
5165
 
4.4%
4899
 
4.2%
Other values (68) 26918
23.2%
Decimal Number
ValueCountFrequency (%)
1 8342
21.4%
2 5375
13.8%
3 4000
10.3%
6 3828
9.8%
5 3666
9.4%
4 3498
9.0%
9 2684
 
6.9%
8 2619
 
6.7%
7 2516
 
6.5%
0 2428
 
6.2%
Space Separator
ValueCountFrequency (%)
35429
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5937
100.0%
Other Punctuation
ValueCountFrequency (%)
, 9
100.0%
Uppercase Letter
ValueCountFrequency (%)
A 2
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 116272
59.1%
Common 80335
40.9%
Latin 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
10566
 
9.1%
10493
 
9.0%
10391
 
8.9%
10275
 
8.8%
10184
 
8.8%
10000
 
8.6%
10000
 
8.6%
7381
 
6.3%
5165
 
4.4%
4899
 
4.2%
Other values (68) 26918
23.2%
Common
ValueCountFrequency (%)
35429
44.1%
1 8342
 
10.4%
- 5937
 
7.4%
2 5375
 
6.7%
3 4000
 
5.0%
6 3828
 
4.8%
5 3666
 
4.6%
4 3498
 
4.4%
9 2684
 
3.3%
8 2619
 
3.3%
Other values (5) 4957
 
6.2%
Latin
ValueCountFrequency (%)
A 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 116272
59.1%
ASCII 80337
40.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
35429
44.1%
1 8342
 
10.4%
- 5937
 
7.4%
2 5375
 
6.7%
3 4000
 
5.0%
6 3828
 
4.8%
5 3666
 
4.6%
4 3498
 
4.4%
9 2684
 
3.3%
8 2619
 
3.3%
Other values (6) 4959
 
6.2%
Hangul
ValueCountFrequency (%)
10566
 
9.1%
10493
 
9.0%
10391
 
8.9%
10275
 
8.8%
10184
 
8.8%
10000
 
8.6%
10000
 
8.6%
7381
 
6.3%
5165
 
4.4%
4899
 
4.2%
Other values (68) 26918
23.2%
Distinct3947
Distinct (%)39.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-01-10T07:46:39.542665image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length27
Median length25
Mean length14.3187
Min length1

Characters and Unicode

Total characters143187
Distinct characters194
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2698 ?
Unique (%)27.0%

Sample

1st row 충청남도 당진시 면천로 98-23
2nd row 충청남도 당진시 시곡로 161-208
3rd row
4th row 충청남도 당진시 당진중앙1로 168
5th row 충청남도 당진시 남부로 200
ValueCountFrequency (%)
충청남도 6757
22.4%
당진시 6757
22.4%
송악읍 1909
 
6.3%
합덕읍 1244
 
4.1%
당진중앙2로 287
 
1.0%
밤절로 187
 
0.6%
반촌로 174
 
0.6%
남부로 149
 
0.5%
서해로 135
 
0.4%
당진중앙1로 134
 
0.4%
Other values (2706) 12451
41.3%
2024-01-10T07:46:39.968110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/