Overview

Dataset statistics

Number of variables14
Number of observations3110
Missing cells5996
Missing cells (%)13.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory358.5 KiB
Average record size in memory118.0 B

Variable types

Categorical2
Text4
DateTime2
Unsupported1
Numeric5

Dataset

Description치과병원 현황(병원급)
Author행정안전부
URLhttps://data.gg.go.kr/portal/data/service/selectServicePage.do?&infId=35UIB8G2423NNDNAW1332073152&infSeq=1

Alerts

소재지우편번호 is highly overall correlated with WGS84위도 and 1 other fieldsHigh correlation
WGS84위도 is highly overall correlated with 소재지우편번호 and 1 other fieldsHigh correlation
WGS84경도 is highly overall correlated with 시군명High correlation
시군명 is highly overall correlated with 소재지우편번호 and 2 other fieldsHigh correlation
영업상태명 is highly imbalanced (77.7%)Imbalance
폐업일자 has 2803 (90.1%) missing valuesMissing
의료기관종별명 has 3110 (100.0%) missing valuesMissing
연면적(㎡) is highly skewed (γ1 = 54.98374492)Skewed
의료기관종별명 is an unsupported type, check if it needs cleaning or further analysisUnsupported
연면적(㎡) has 33 (1.1%) zerosZeros

Reproduction

Analysis started2024-07-13 14:10:23.610258
Analysis finished2024-07-13 14:10:33.815513
Duration10.21 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

시군명
Categorical

HIGH CORRELATION 

Distinct31
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size24.4 KiB
부천시
397 
성남시
293 
수원시
274 
용인시
238 
고양시
189 
Other values (26)
1719 

Length

Max length4
Median length3
Mean length3.0733119
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row가평군
2nd row가평군
3rd row가평군
4th row가평군
5th row가평군

Common Values

ValueCountFrequency (%)
부천시 397
 
12.8%
성남시 293
 
9.4%
수원시 274
 
8.8%
용인시 238
 
7.7%
고양시 189
 
6.1%
화성시 163
 
5.2%
시흥시 154
 
5.0%
안양시 137
 
4.4%
남양주시 116
 
3.7%
평택시 111
 
3.6%
Other values (21) 1038
33.4%

Length

2024-07-13T23:10:34.046314image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
부천시 397
 
12.8%
성남시 293
 
9.4%
수원시 274
 
8.8%
용인시 238
 
7.7%
고양시 189
 
6.1%
화성시 163
 
5.2%
시흥시 154
 
5.0%
안양시 137
 
4.4%
남양주시 116
 
3.7%
평택시 111
 
3.6%
Other values (21) 1038
33.4%
Distinct2412
Distinct (%)77.6%
Missing0
Missing (%)0.0%
Memory size24.4 KiB
2024-07-13T23:10:34.778761image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length23
Median length21
Mean length7.9758842
Min length4

Characters and Unicode

Total characters24805
Distinct characters535
Distinct categories10 ?
Distinct scripts4 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2073 ?
Unique (%)66.7%

Sample

1st row푸른치과의원
2nd row횃불치과의원
3rd row올바른 치과의원
4th row서울정성치과의원
5th row연세세브란스치과의원
ValueCountFrequency (%)
치과의원 45
 
1.4%
유디치과의원 15
 
0.5%
굿모닝치과의원 12
 
0.4%
서울치과의원 12
 
0.4%
우리치과의원 10
 
0.3%
현대치과의원 9
 
0.3%
서울삼성치과의원 9
 
0.3%
연세치과의원 9
 
0.3%
하나치과의원 8
 
0.2%
이편한치과의원 8
 
0.2%
Other values (2430) 3082
95.7%
2024-07-13T23:10:35.685094image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3345
 
13.5%
3232
 
13.0%
3178
 
12.8%
3067
 
12.4%
479
 
1.9%
432
 
1.7%
422
 
1.7%
335
 
1.4%
334
 
1.3%
304
 
1.2%
Other values (525) 9677
39.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 24320
98.0%
Decimal Number 148
 
0.6%
Space Separator 109
 
0.4%
Uppercase Letter 95
 
0.4%
Lowercase Letter 53
 
0.2%
Open Punctuation 28
 
0.1%
Close Punctuation 28
 
0.1%
Dash Punctuation 18
 
0.1%
Other Punctuation 5
 
< 0.1%
Math Symbol 1
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
3345
 
13.8%
3232
 
13.3%
3178
 
13.1%
3067
 
12.6%
479
 
2.0%
432
 
1.8%
422
 
1.7%
335
 
1.4%
334
 
1.4%
304
 
1.2%
Other values (478) 9192
37.8%
Uppercase Letter
ValueCountFrequency (%)
S 20
21.1%
T 10
10.5%
U 7
 
7.4%
C 7
 
7.4%
D 7
 
7.4%
M 5
 
5.3%
K 5
 
5.3%
E 5
 
5.3%
O 4
 
4.2%
L 4
 
4.2%
Other values (9) 21
22.1%
Lowercase Letter
ValueCountFrequency (%)
e 37
69.8%
h 5
 
9.4%
i 2
 
3.8%
r 2
 
3.8%
u 2
 
3.8%
a 1
 
1.9%
o 1
 
1.9%
l 1
 
1.9%
y 1
 
1.9%
n 1
 
1.9%
Decimal Number
ValueCountFrequency (%)
3 34
23.0%
5 31
20.9%
6 26
17.6%
1 17
11.5%
2 15
10.1%
0 12
 
8.1%
8 4
 
2.7%
7 4
 
2.7%
4 3
 
2.0%
9 2
 
1.4%
Other Punctuation
ValueCountFrequency (%)
. 3
60.0%
, 1
 
20.0%
& 1
 
20.0%
Space Separator
ValueCountFrequency (%)
109
100.0%
Open Punctuation
ValueCountFrequency (%)
( 28
100.0%
Close Punctuation
ValueCountFrequency (%)
) 28
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 24318
98.0%
Common 337
 
1.4%
Latin 148
 
0.6%
Han 2
 
< 0.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
3345
 
13.8%
3232
 
13.3%
3178
 
13.1%
3067
 
12.6%
479
 
2.0%
432
 
1.8%
422
 
1.7%
335
 
1.4%
334
 
1.4%
304
 
1.3%
Other values (476) 9190
37.8%
Latin
ValueCountFrequency (%)
e 37
25.0%
S 20
13.5%
T 10
 
6.8%
U 7
 
4.7%
C 7
 
4.7%
D 7
 
4.7%
h 5
 
3.4%
M 5
 
3.4%
K 5
 
3.4%
E 5
 
3.4%
Other values (19) 40
27.0%
Common
ValueCountFrequency (%)
109
32.3%
3 34
 
10.1%
5 31
 
9.2%
( 28
 
8.3%
) 28
 
8.3%
6 26
 
7.7%
- 18
 
5.3%
1 17
 
5.0%
2 15
 
4.5%
0 12
 
3.6%
Other values (8) 19
 
5.6%
Han
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 24318
98.0%
ASCII 485
 
2.0%
CJK 2
 
< 0.1%

Most frequent character per block

Hangul
ValueCountFrequency (%)
3345
 
13.8%
3232
 
13.3%
3178
 
13.1%
3067
 
12.6%
479
 
2.0%
432
 
1.8%
422
 
1.7%
335
 
1.4%
334
 
1.4%
304
 
1.3%
Other values (476) 9190
37.8%
ASCII
ValueCountFrequency (%)
109
22.5%
e 37
 
7.6%
3 34
 
7.0%
5 31
 
6.4%
( 28
 
5.8%
) 28
 
5.8%
6 26
 
5.4%
S 20
 
4.1%
- 18
 
3.7%
1 17
 
3.5%
Other values (37) 137
28.2%
CJK
ValueCountFrequency (%)
1
50.0%
1
50.0%
Distinct2504
Distinct (%)80.5%
Missing0
Missing (%)0.0%
Memory size24.4 KiB
Minimum1978-05-04 00:00:00
Maximum2024-07-02 00:00:00
2024-07-13T23:10:36.053379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-07-13T23:10:36.565379image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

영업상태명
Categorical

IMBALANCE 

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size24.4 KiB
영업중
2745 
폐업
283 
운영중
 
36
폐업 등
 
18
전출
 
11
Other values (3)
 
17

Length

Max length4
Median length3
Mean length2.9102894
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row영업중
2nd row영업중
3rd row영업중
4th row영업중
5th row영업중

Common Values

ValueCountFrequency (%)
영업중 2745
88.3%
폐업 283
 
9.1%
운영중 36
 
1.2%
폐업 등 18
 
0.6%
전출 11
 
0.4%
휴업 9
 
0.3%
직권폐업 7
 
0.2%
삭제 1
 
< 0.1%

Length

2024-07-13T23:10:37.067229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-07-13T23:10:37.435214image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
영업중 2745
87.8%
폐업 301
 
9.6%
운영중 36
 
1.2%
18
 
0.6%
전출 11
 
0.4%
휴업 9
 
0.3%
직권폐업 7
 
0.2%
삭제 1
 
< 0.1%

폐업일자
Date

MISSING 

Distinct274
Distinct (%)89.3%
Missing2803
Missing (%)90.1%
Memory size24.4 KiB
Minimum2005-12-23 00:00:00
Maximum2024-06-29 00:00:00
2024-07-13T23:10:37.790870image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-07-13T23:10:38.108453image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

의료기관종별명
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing3110
Missing (%)100.0%
Memory size27.5 KiB

의료인수(명)
Real number (ℝ)

Distinct17
Distinct (%)0.5%
Missing17
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean1.5738765
Minimum0
Maximum34
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size27.5 KiB
2024-07-13T23:10:38.323899image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q11
median1
Q32
95-th percentile4
Maximum34
Range34
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.4294954
Coefficient of variation (CV)0.90826402
Kurtosis130.95109
Mean1.5738765
Median Absolute Deviation (MAD)0
Skewness8.4675952
Sum4868
Variance2.0434571
MonotonicityNot monotonic
2024-07-13T23:10:38.676825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
1 2136
68.7%
2 586
 
18.8%
3 210
 
6.8%
4 75
 
2.4%
5 35
 
1.1%
6 16
 
0.5%
7 11
 
0.4%
8 9
 
0.3%
9 4
 
0.1%
11 3
 
0.1%
Other values (7) 8
 
0.3%
(Missing) 17
 
0.5%
ValueCountFrequency (%)
0 1
 
< 0.1%
1 2136
68.7%
2 586
 
18.8%
3 210
 
6.8%
4 75
 
2.4%
5 35
 
1.1%
6 16
 
0.5%
7 11
 
0.4%
8 9
 
0.3%
9 4
 
0.1%
ValueCountFrequency (%)
34 1
 
< 0.1%
22 1
 
< 0.1%
20 2
 
0.1%
19 1
 
< 0.1%
16 1
 
< 0.1%
11 3
 
0.1%
10 1
 
< 0.1%
9 4
 
0.1%
8 9
0.3%
7 11
0.4%
Distinct429
Distinct (%)13.9%
Missing17
Missing (%)0.5%
Memory size24.4 KiB
2024-07-13T23:10:39.094789image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length83
Median length70
Mean length23.868413
Min length2

Characters and Unicode

Total characters73825
Distinct characters43
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique301 ?
Unique (%)9.7%

Sample

1st row411 410 409 408 407 406 405 404 403 402
2nd row404
3rd row411 410 409 408 407 406 405 404 403 402
4th row412 411 410 409 408 407 406 405 404 403 402
5th row412 411 410 409 408 407 406 405 404 403 402
ValueCountFrequency (%)
403 1735
9.1%
406 1712
9.0%
407 1702
8.9%
405 1643
8.6%
402 1631
8.5%
404 1559
8.2%
411 1545
8.1%
408 1542
8.1%
409 1406
7.4%
401 1262
 
6.6%
Other values (30) 3364
17.6%
2024-07-13T23:10:39.966525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
4 19520
26.4%
0 16309
22.1%
16008
21.7%
1 6710
 
9.1%
2 3811
 
5.2%
3 1834
 
2.5%
6 1830
 
2.5%
7 1820
 
2.5%
5 1758
 
2.4%
8 1619
 
2.2%
Other values (33) 2606
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 56676
76.8%
Space Separator 16008
 
21.7%
Other Letter 969
 
1.3%
Other Punctuation 172
 
0.2%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
291
30.0%
162
16.7%
54
 
5.6%
46
 
4.7%
46
 
4.7%
28
 
2.9%
28
 
2.9%
27
 
2.8%
27
 
2.8%
24
 
2.5%
Other values (21) 236
24.4%
Decimal Number
ValueCountFrequency (%)
4 19520
34.4%
0 16309
28.8%
1 6710
 
11.8%
2 3811
 
6.7%
3 1834
 
3.2%
6 1830
 
3.2%
7 1820
 
3.2%
5 1758
 
3.1%
8 1619
 
2.9%
9 1465
 
2.6%
Space Separator
ValueCountFrequency (%)
16008
100.0%
Other Punctuation
ValueCountFrequency (%)
, 172
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 72856
98.7%
Hangul 969
 
1.3%

Most frequent character per script

Hangul
ValueCountFrequency (%)
291
30.0%
162
16.7%
54
 
5.6%
46
 
4.7%
46
 
4.7%
28
 
2.9%
28
 
2.9%
27
 
2.8%
27
 
2.8%
24
 
2.5%
Other values (21) 236
24.4%
Common
ValueCountFrequency (%)
4 19520
26.8%
0 16309
22.4%
16008
22.0%
1 6710
 
9.2%
2 3811
 
5.2%
3 1834
 
2.5%
6 1830
 
2.5%
7 1820
 
2.5%
5 1758
 
2.4%
8 1619
 
2.2%
Other values (2) 1637
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 72856
98.7%
Hangul 969
 
1.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4 19520
26.8%
0 16309
22.4%
16008
22.0%
1 6710
 
9.2%
2 3811
 
5.2%
3 1834
 
2.5%
6 1830
 
2.5%
7 1820
 
2.5%
5 1758
 
2.4%
8 1619
 
2.2%
Other values (2) 1637
 
2.2%
Hangul
ValueCountFrequency (%)
291
30.0%
162
16.7%
54
 
5.6%
46
 
4.7%
46
 
4.7%
28
 
2.9%
28
 
2.9%
27
 
2.8%
27
 
2.8%
24
 
2.5%
Other values (21) 236
24.4%

연면적(㎡)
Real number (ℝ)

SKEWED  ZEROS 

Distinct2573
Distinct (%)83.2%
Missing17
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean246.09652
Minimum0
Maximum120985
Zeros33
Zeros (%)1.1%
Negative0
Negative (%)0.0%
Memory size27.5 KiB
2024-07-13T23:10:40.327466image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile69.292
Q1118.94
median166.78
Q3241.61
95-th percentile450.972
Maximum120985
Range120985
Interquartile range (IQR)122.67

Descriptive statistics

Standard deviation2180.0451
Coefficient of variation (CV)8.8584961
Kurtosis3045.8166
Mean246.09652
Median Absolute Deviation (MAD)57.77
Skewness54.983745
Sum761176.54
Variance4752596.5
MonotonicityNot monotonic
2024-07-13T23:10:40.704236image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0 33
 
1.1%
165.0 12
 
0.4%
180.0 11
 
0.4%
175.0 10
 
0.3%
1.0 7
 
0.2%
160.0 7
 
0.2%
120.0 7
 
0.2%
132.0 6
 
0.2%
147.0 6
 
0.2%
142.0 6
 
0.2%
Other values (2563) 2988
96.1%
(Missing) 17
 
0.5%
ValueCountFrequency (%)
0.0 33
1.1%
1.0 7
 
0.2%
7.92 1
 
< 0.1%
15.03 1
 
< 0.1%
16.38 1
 
< 0.1%
22.9 1
 
< 0.1%
25.2 1
 
< 0.1%
30.0 1
 
< 0.1%
31.28 1
 
< 0.1%
33.44 1
 
< 0.1%
ValueCountFrequency (%)
120985.0 1
< 0.1%
4990.79 1
< 0.1%
3291.74 1
< 0.1%
3058.42 1
< 0.1%
2220.5 1
< 0.1%
1924.78 1
< 0.1%
1673.66 1
< 0.1%
1573.17 1
< 0.1%
1553.82 1
< 0.1%
1550.28 1
< 0.1%
Distinct3072
Distinct (%)98.8%
Missing0
Missing (%)0.0%
Memory size24.4 KiB
2024-07-13T23:10:41.383417image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/