Overview

Dataset statistics

Number of variables19
Number of observations10000
Missing cells12819
Missing cells (%)6.7%
Duplicate rows11
Duplicate rows (%)0.1%
Total size in memory1.6 MiB
Average record size in memory164.0 B

Variable types

Numeric1
Text2
Unsupported2
Categorical13
Boolean1

Dataset

Description국토지리정보원의 항공사진 관련 메타데이터 중 정사영상 성과내역 입니다. (5000도엽번호, 5000도엽명, 축척, 원점구분, 제작년월일 등)
Author국토교통부 국토지리정보원
URLhttps://www.data.go.kr/data/15067667/fileData.do

Alerts

제작기관 has constant value ""Constant
Dataset has 11 (0.1%) duplicate rowsDuplicates
작업기관 has a high cardinality: 51 distinct valuesHigh cardinality
예비 is highly overall correlated with 도엽번호5000 and 10 other fieldsHigh correlation
참고사항 is highly overall correlated with 도엽번호5000 and 12 other fieldsHigh correlation
칼라구분 is highly overall correlated with 지상표본거리 and 6 other fieldsHigh correlation
원시영상자료유무 is highly overall correlated with 작업기관 and 2 other fieldsHigh correlation
구축구분코드 is highly overall correlated with 작업기관 and 5 other fieldsHigh correlation
작업기관 is highly overall correlated with 지상표본거리 and 11 other fieldsHigh correlation
사용수치표고 is highly overall correlated with 작업기관 and 6 other fieldsHigh correlation
카메라종류 is highly overall correlated with 지상표본거리 and 9 other fieldsHigh correlation
지상표본거리 is highly overall correlated with 작업기관 and 6 other fieldsHigh correlation
해상도완화지역여부 is highly overall correlated with 작업기관 and 6 other fieldsHigh correlation
원시지리좌표계 is highly overall correlated with 지상표본거리 and 7 other fieldsHigh correlation
지리좌표계 is highly overall correlated with 도엽번호5000 and 12 other fieldsHigh correlation
원점구분 is highly overall correlated with 작업기관 and 6 other fieldsHigh correlation
도엽번호5000 is highly overall correlated with 참고사항 and 2 other fieldsHigh correlation
지상표본거리 is highly imbalanced (76.7%)Imbalance
원점구분 is highly imbalanced (54.9%)Imbalance
원시영상자료유무 is highly imbalanced (85.7%)Imbalance
참고사항 is highly imbalanced (97.6%)Imbalance
예비 is highly imbalanced (97.6%)Imbalance
칼라구분 is highly imbalanced (82.8%)Imbalance
지리좌표계 is highly imbalanced (88.7%)Imbalance
원시지리좌표계 is highly imbalanced (80.6%)Imbalance
구축구분코드 is highly imbalanced (75.2%)Imbalance
해상도완화지역여부 is highly imbalanced (85.0%)Imbalance
도엽명5000 has 248 (2.5%) missing valuesMissing
축적 has 247 (2.5%) missing valuesMissing
원시영상자료유무 has 2521 (25.2%) missing valuesMissing
워성종류 has 9803 (98.0%) missing valuesMissing
축적 is an unsupported type, check if it needs cleaning or further analysisUnsupported
워성종류 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2023-12-12 14:25:05.038302
Analysis finished2023-12-12 14:25:07.514359
Duration2.48 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

도엽번호5000
Real number (ℝ)

HIGH CORRELATION 

Distinct7928
Distinct (%)79.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean36430546
Minimum32514040
Maximum38815100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size166.0 KiB
2023-12-12T23:25:07.602636image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum32514040
5-th percentile34605095
Q135706092
median36616072
Q337612006
95-th percentile37815004
Maximum38815100
Range6301060
Interquartile range (IQR)1905914.8

Descriptive statistics

Standard deviation1145460.5
Coefficient of variation (CV)0.031442308
Kurtosis-0.53942582
Mean36430546
Median Absolute Deviation (MAD)914069.5
Skewness-0.24579447
Sum3.6430546 × 1011
Variance1.3120797 × 1012
MonotonicityNot monotonic
2023-12-12T23:25:07.793571image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
35701064 8
 
0.1%
35701075 5
 
0.1%
37702075 5
 
0.1%
36616010 5
 
0.1%
35701083 5
 
0.1%
35701076 4
 
< 0.1%
37711012 4
 
< 0.1%
37610010 4
 
< 0.1%
35803093 4
 
< 0.1%
37703033 4
 
< 0.1%
Other values (7918) 9952
99.5%
ValueCountFrequency (%)
32514040 1
< 0.1%
33602001 1
< 0.1%
33602002 1
< 0.1%
33602013 1
< 0.1%
33602016 1
< 0.1%
33602023 2
< 0.1%
33602024 1
< 0.1%
33602033 1
< 0.1%
33604007 1
< 0.1%
33604008 2
< 0.1%
ValueCountFrequency (%)
38815100 1
< 0.1%
38815094 1
< 0.1%
38815091 1
< 0.1%
38815085 1
< 0.1%
38815082 1
< 0.1%
38815077 1
< 0.1%
38815067 1
< 0.1%
38815064 1
< 0.1%
38815057 2
< 0.1%
38815056 1
< 0.1%

도엽명5000
Text

MISSING 

Distinct7646
Distinct (%)78.4%
Missing248
Missing (%)2.5%
Memory size156.2 KiB
2023-12-12T23:25:08.116180image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length8
Median length5
Mean length4.9410377
Min length2

Characters and Unicode

Total characters48185
Distinct characters190
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5987 ?
Unique (%)61.4%

Sample

1st row욕지033
2nd row서천018
3rd row보은064
4th row안양070
5th row보령066
ValueCountFrequency (%)
목포 19
 
0.2%
산청 17
 
0.2%
화원 14
 
0.1%
운봉 14
 
0.1%
함양 14
 
0.1%
임실 12
 
0.1%
하의 12
 
0.1%
모슬포 10
 
0.1%
제주 10
 
0.1%
성산 9
 
0.1%
Other values (7636) 9621
98.7%
2023-12-12T23:25:08.592514image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 11319
23.5%
1 2069
 
4.3%
3 1933
 
4.0%
4 1908
 
4.0%
7 1901
 
3.9%
5 1901
 
3.9%
2 1898
 
3.9%
6 1861
 
3.9%
8 1861
 
3.9%
9 1842
 
3.8%
Other values (180) 19692
40.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 28493
59.1%
Other Letter 19684
40.9%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
1008
 
5.1%
928
 
4.7%
868
 
4.4%
807
 
4.1%
643
 
3.3%
477
 
2.4%
443
 
2.3%
409
 
2.1%
394
 
2.0%
329
 
1.7%
Other values (168) 13378
68.0%
Decimal Number
ValueCountFrequency (%)
0 11319
39.7%
1 2069
 
7.3%
3 1933
 
6.8%
4 1908
 
6.7%
7 1901
 
6.7%
5 1901
 
6.7%
2 1898
 
6.7%
6 1861
 
6.5%
8 1861
 
6.5%
9 1842
 
6.5%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 28501
59.1%
Hangul 19684
40.9%

Most frequent character per script

Hangul
ValueCountFrequency (%)
1008
 
5.1%
928
 
4.7%
868
 
4.4%
807
 
4.1%
643
 
3.3%
477
 
2.4%
443
 
2.3%
409
 
2.1%
394
 
2.0%
329
 
1.7%
Other values (168) 13378
68.0%
Common
ValueCountFrequency (%)
0 11319
39.7%
1 2069
 
7.3%
3 1933
 
6.8%
4 1908
 
6.7%
7 1901
 
6.7%
5 1901
 
6.7%
2 1898
 
6.7%
6 1861
 
6.5%
8 1861
 
6.5%
9 1842
 
6.5%
Other values (2) 8
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 28501
59.1%
Hangul 19684
40.9%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 11319
39.7%
1 2069
 
7.3%
3 1933
 
6.8%
4 1908
 
6.7%
7 1901
 
6.7%
5 1901
 
6.7%
2 1898
 
6.7%
6 1861
 
6.5%
8 1861
 
6.5%
9 1842
 
6.5%
Other values (2) 8
 
< 0.1%
Hangul
ValueCountFrequency (%)
1008
 
5.1%
928
 
4.7%
868
 
4.4%
807
 
4.1%
643
 
3.3%
477
 
2.4%
443
 
2.3%
409
 
2.1%
394
 
2.0%
329
 
1.7%
Other values (168) 13378
68.0%

축적
Unsupported

MISSING  REJECTED  UNSUPPORTED 

Missing247
Missing (%)2.5%
Memory size156.2 KiB

지상표본거리
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
0.25
9247 
0.51
 
451
0.12
 
289
1.0
 
13

Length

Max length4
Median length4
Mean length3.9987
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.25
2nd row0.25
3rd row0.25
4th row0.25
5th row0.25

Common Values

ValueCountFrequency (%)
0.25 9247
92.5%
0.51 451
 
4.5%
0.12 289
 
2.9%
1.0 13
 
0.1%

Length

2023-12-12T23:25:08.752941image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:25:08.857677image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
0.25 9247
92.5%
0.51 451
 
4.5%
0.12 289
 
2.9%
1.0 13
 
0.1%

제작기관
Categorical

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
국토지리정보원
10000 

Length

Max length7
Median length7
Mean length7
Min length7

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row국토지리정보원
2nd row국토지리정보원
3rd row국토지리정보원
4th row국토지리정보원
5th row국토지리정보원

Common Values

ValueCountFrequency (%)
국토지리정보원 10000
100.0%

Length

2023-12-12T23:25:08.969128image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:25:09.084926image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국토지리정보원 10000
100.0%

작업기관
Categorical

HIGH CARDINALITY  HIGH CORRELATION 

Distinct51
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
새한항업㈜
1270 
㈜범아엔지니어링 (02-3487-0011)
645 
한국종합설계㈜
 
607
중앙항업(주)
 
606
삼아항업
 
589
Other values (46)
6283 

Length

Max length30
Median length28
Mean length7.748
Min length3

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row㈜범아엔지니어링 (02-3487-0011)
2nd row삼아항업
3rd row공간정보기술(주)
4th row명화지리정보(주)
5th row한양지에스티

Common Values

ValueCountFrequency (%)
새한항업㈜ 1270
 
12.7%
㈜범아엔지니어링 (02-3487-0011) 645
 
6.5%
한국종합설계㈜ 607
 
6.1%
중앙항업(주) 606
 
6.1%
삼아항업 589
 
5.9%
중앙항업㈜ 564
 
5.6%
공간정보기술(주) 544
 
5.4%
한양지에스티 358
 
3.6%
천우항측㈜ 339
 
3.4%
네이버시스템㈜ 320
 
3.2%
Other values (41) 4158
41.6%

Length

2023-12-12T23:25:09.210217image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
새한항업㈜ 1270
 
11.6%
㈜범아엔지니어링 886
 
8.1%
02-3487-0011 645
 
5.9%
한국종합설계㈜ 607
 
5.5%
중앙항업(주 606
 
5.5%
삼아항업 589
 
5.4%
중앙항업㈜ 564
 
5.2%
공간정보기술(주 544
 
5.0%
한양지에스티 358
 
3.3%
천우항측㈜ 339
 
3.1%
Other values (41) 4535
41.4%

원점구분
Categorical

HIGH CORRELATION  IMBALANCE 

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
중부
6151 
동부
3733 
서부
 
93
㈜한양지에스티
 
13
동해
 
10

Length

Max length7
Median length2
Mean length2.0065
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row동부
2nd row중부
3rd row중부
4th row중부
5th row중부

Common Values

ValueCountFrequency (%)
중부 6151
61.5%
동부 3733
37.3%
서부 93
 
0.9%
㈜한양지에스티 13
 
0.1%
동해 10
 
0.1%

Length

2023-12-12T23:25:09.355484image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-12T23:25:09.499448image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
중부 6151
61.5%
동부 3733
37.3%
서부 93
 
0.9%
㈜한양지에스티 13
 
0.1%
동해 10
 
0.1%
Distinct102
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size156.2 KiB
2023-12-12T23:25:09.841201image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.5772
Min length2

Characters and Unicode

Total characters95772
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row2014-01-24
2nd row2012-07-13
3rd row2016-12-15
4th row2018-11-29
5th row2011-10-26
ValueCountFrequency (%)
2012-07-13 1329
 
13.3%
2014-12-26 774
 
7.7%
2014-01-24 629
 
6.3%
2011-11-14 589
 
5.9%
522
 
5.2%
2011-11-04 469
 
4.7%
2011-10-26 358
 
3.6%
2015-11-27 256
 
2.6%
2016-12-15 254
 
2.5%
2017-12-01 195
 
1.9%
Other values (92) 4625
46.2%
2023-12-12T23:25:10.306689image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 24649
25.7%
- 20000
20.9%
2 17541
18.3%
0 16737
17.5%
4 3769
 
3.9%
7 3541
 
3.7%
6 2436
 
2.5%
3 2208
 
2.3%
5 1733
 
1.8%
8 1708
 
1.8%
Other values (3) 1450
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 75720
79.1%
Dash Punctuation 20000
 
20.9%
Other Letter 52
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 24649
32.6%
2 17541
23.2%
0 16737
22.1%
4 3769
 
5.0%
7 3541
 
4.7%
6 2436
 
3.2%
3 2208
 
2.9%
5 1733
 
2.3%
8 1708
 
2.3%
9 1398
 
1.8%
Other Letter
ValueCountFrequency (%)
26
50.0%
26
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 20000
100.0%

Most occurring scripts

ValueCountFrequency (%)