Overview

Dataset statistics

Number of variables11
Number of observations10000
Missing cells515
Missing cells (%)0.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1005.9 KiB
Average record size in memory103.0 B

Variable types

Text4
Categorical2
Numeric5

Dataset

Description관리_호별_전유_공용_면적_pk,관리_호별_명세_pk,평형_구분_명,전유_공용_구분_코드,주_부속_구분_코드,층_구분_코드,층_번호,구조_코드,주_용도_코드,기타_용도,면적
Author서울특별시
URLhttps://data.seoul.go.kr/dataList/OA-15662/S/1/datasetView.do

Alerts

층_구분_코드 is highly overall correlated with 층_번호 and 1 other fieldsHigh correlation
층_번호 is highly overall correlated with 층_구분_코드High correlation
전유_공용_구분_코드 is highly overall correlated with 층_구분_코드High correlation
주_부속_구분_코드 is highly imbalanced (77.0%)Imbalance
기타_용도 has 515 (5.1%) missing valuesMissing
면적 is highly skewed (γ1 = 38.17277674)Skewed
관리_호별_전유_공용_면적_pk has unique valuesUnique
층_번호 has 4819 (48.2%) zerosZeros

Reproduction

Analysis started2024-07-13 18:52:06.828839
Analysis finished2024-07-13 18:52:10.772484
Duration3.94 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Distinct10000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-07-14T03:52:10.930660image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length14.0851
Min length8

Characters and Unicode

Total characters140851
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique10000 ?
Unique (%)100.0%

Sample

1st row11000-100140792
2nd row11000-114673
3rd row11000-107512
4th row11000-100005708
5th row11000-100042315
ValueCountFrequency (%)
11000-100140792 1
 
< 0.1%
11000-100032940 1
 
< 0.1%
11000-100013127 1
 
< 0.1%
11000-100035747 1
 
< 0.1%
11000-105852 1
 
< 0.1%
11000-100041078 1
 
< 0.1%
11000-117737 1
 
< 0.1%
11000-100043991 1
 
< 0.1%
11000-100045936 1
 
< 0.1%
11000-111966 1
 
< 0.1%
Other values (9990) 9990
99.9%
2024-07-14T03:52:11.366662image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 54904
39.0%
1 38779
27.5%
- 10000
 
7.1%
4 5704
 
4.0%
3 5488
 
3.9%
2 5242
 
3.7%
5 5184
 
3.7%
6 4136
 
2.9%
7 3928
 
2.8%
9 3765
 
2.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 130851
92.9%
Dash Punctuation 10000
 
7.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 54904
42.0%
1 38779
29.6%
4 5704
 
4.4%
3 5488
 
4.2%
2 5242
 
4.0%
5 5184
 
4.0%
6 4136
 
3.2%
7 3928
 
3.0%
9 3765
 
2.9%
8 3721
 
2.8%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 140851
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 54904
39.0%
1 38779
27.5%
- 10000
 
7.1%
4 5704
 
4.0%
3 5488
 
3.9%
2 5242
 
3.7%
5 5184
 
3.7%
6 4136
 
2.9%
7 3928
 
2.8%
9 3765
 
2.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 140851
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 54904
39.0%
1 38779
27.5%
- 10000
 
7.1%
4 5704
 
4.0%
3 5488
 
3.9%
2 5242
 
3.7%
5 5184
 
3.7%
6 4136
 
2.9%
7 3928
 
2.8%
9 3765
 
2.7%
Distinct7799
Distinct (%)78.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-07-14T03:52:11.605629image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length28
Median length15
Mean length13.7814
Min length7

Characters and Unicode

Total characters137814
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5956 ?
Unique (%)59.6%

Sample

1st row11000-100032629
2nd row11000-20530
3rd row11000-19404
4th row11000-100002176
5th row11000-100010847
ValueCountFrequency (%)
11000-100033514 17
 
0.2%
11000-100030210 8
 
0.1%
11000-100033103 7
 
0.1%
11000-100033511 7
 
0.1%
11000-100032236 7
 
0.1%
11000-100033510 6
 
0.1%
11000-100032233 6
 
0.1%
11000-100033036 5
 
< 0.1%
11000-100032014 5
 
< 0.1%
11000-100032009 5
 
< 0.1%
Other values (7789) 9927
99.3%
2024-07-14T03:52:11.972846image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 58159
42.2%
1 35279
25.6%
- 10000
 
7.3%
2 6638
 
4.8%
3 5476
 
4.0%
9 4404
 
3.2%
8 4117
 
3.0%
4 3956
 
2.9%
5 3530
 
2.6%
7 3269
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 127814
92.7%
Dash Punctuation 10000
 
7.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 58159
45.5%
1 35279
27.6%
2 6638
 
5.2%
3 5476
 
4.3%
9 4404
 
3.4%
8 4117
 
3.2%
4 3956
 
3.1%
5 3530
 
2.8%
7 3269
 
2.6%
6 2986
 
2.3%
Dash Punctuation
ValueCountFrequency (%)
- 10000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 137814
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 58159
42.2%
1 35279
25.6%
- 10000
 
7.3%
2 6638
 
4.8%
3 5476
 
4.0%
9 4404
 
3.2%
8 4117
 
3.0%
4 3956
 
2.9%
5 3530
 
2.6%
7 3269
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 137814
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 58159
42.2%
1 35279
25.6%
- 10000
 
7.3%
2 6638
 
4.8%
3 5476
 
4.0%
9 4404
 
3.2%
8 4117
 
3.0%
4 3956
 
2.9%
5 3530
 
2.6%
7 3269
 
2.4%
Distinct1984
Distinct (%)19.8%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2024-07-14T03:52:12.310362image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/