Overview

Dataset statistics

Number of variables12
Number of observations2388
Missing cells14328
Missing cells (%)50.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory242.7 KiB
Average record size in memory104.1 B

Variable types

Categorical3
Text2
Numeric1
Unsupported6

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

용도 is highly imbalanced (91.1%)Imbalance
Unnamed: 6 has 2388 (100.0%) missing valuesMissing
Unnamed: 7 has 2388 (100.0%) missing valuesMissing
Unnamed: 8 has 2388 (100.0%) missing valuesMissing
Unnamed: 9 has 2388 (100.0%) missing valuesMissing
Unnamed: 10 has 2388 (100.0%) missing valuesMissing
Unnamed: 11 has 2388 (100.0%) missing valuesMissing
Unnamed: 6 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 7 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 8 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 9 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 10 is an unsupported type, check if it needs cleaning or further analysisUnsupported
Unnamed: 11 is an unsupported type, check if it needs cleaning or further analysisUnsupported

Reproduction

Analysis started2024-03-23 07:43:08.785148
Analysis finished2024-03-23 07:43:10.396607
Duration1.61 second
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
2011
598 
2013
575 
2010
554 
2012
552 
2009
109 

Length

Max length4
Median length4
Mean length4
Min length4

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2009
2nd row2009
3rd row2009
4th row2009
5th row2009

Common Values

ValueCountFrequency (%)
2011 598
25.0%
2013 575
24.1%
2010 554
23.2%
2012 552
23.1%
2009 109
 
4.6%

Length

2024-03-23T07:43:10.725724image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:11.025052image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
2011 598
25.0%
2013 575
24.1%
2010 554
23.2%
2012 552
23.1%
2009 109
 
4.6%

시도
Text

Distinct90
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
2024-03-23T07:43:11.543821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.1235343
Min length7

Characters and Unicode

Total characters19399
Distinct characters87
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row강원도 고성군
2nd row강원도 고성군
3rd row강원도 고성군
4th row강원도 고성군
5th row강원도 인제군
ValueCountFrequency (%)
전라남도 449
 
9.1%
경상북도 394
 
8.0%
전라북도 256
 
5.2%
경상남도 252
 
5.1%
경기도 245
 
5.0%
강원도 221
 
4.5%
충청북도 217
 
4.4%
충청남도 205
 
4.2%
북구 52
 
1.1%
논산시 43
 
0.9%
Other values (97) 2600
52.7%
2024-03-23T07:43:12.512086image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2546
 
13.1%
2239
 
11.5%
1274
 
6.6%
1151
 
5.9%
995
 
5.1%
968
 
5.0%
919
 
4.7%
757
 
3.9%
705
 
3.6%
672
 
3.5%
Other values (77) 7173
37.0%

Most occurring categories

ValueCountFrequency (%)
Other Letter 16853
86.9%
Space Separator 2546
 
13.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
2239
 
13.3%
1274
 
7.6%
1151
 
6.8%
995
 
5.9%
968
 
5.7%
919
 
5.5%
757
 
4.5%
705
 
4.2%
672
 
4.0%
479
 
2.8%
Other values (76) 6694
39.7%
Space Separator
ValueCountFrequency (%)
2546
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 16853
86.9%
Common 2546
 
13.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
2239
 
13.3%
1274
 
7.6%
1151
 
6.8%
995
 
5.9%
968
 
5.7%
919
 
5.5%
757
 
4.5%
705
 
4.2%
672
 
4.0%
479
 
2.8%
Other values (76) 6694
39.7%
Common
ValueCountFrequency (%)
2546
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 16853
86.9%
ASCII 2546
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2546
100.0%
Hangul
ValueCountFrequency (%)
2239
 
13.3%
1274
 
7.6%
1151
 
6.8%
995
 
5.9%
968
 
5.7%
919
 
5.5%
757
 
4.5%
705
 
4.2%
672
 
4.0%
479
 
2.8%
Other values (76) 6694
39.7%

연산
Real number (ℝ)

Distinct9
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2009.1323
Minimum2005
Maximum2013
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size21.1 KiB
2024-03-23T07:43:12.731384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2005
5-th percentile2005
Q12008
median2009
Q32011
95-th percentile2012
Maximum2013
Range8
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.8817342
Coefficient of variation (CV)0.00093659045
Kurtosis-0.40265705
Mean2009.1323
Median Absolute Deviation (MAD)1
Skewness-0.45401308
Sum4797808
Variance3.5409234
MonotonicityNot monotonic
2024-03-23T07:43:12.977369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2009 491
20.6%
2010 447
18.7%
2011 425
17.8%
2008 386
16.2%
2012 204
8.5%
2007 168
 
7.0%
2005 140
 
5.9%
2006 117
 
4.9%
2013 10
 
0.4%
ValueCountFrequency (%)
2005 140
 
5.9%
2006 117
 
4.9%
2007 168
 
7.0%
2008 386
16.2%
2009 491
20.6%
2010 447
18.7%
2011 425
17.8%
2012 204
8.5%
2013 10
 
0.4%
ValueCountFrequency (%)
2013 10
 
0.4%
2012 204
8.5%
2011 425
17.8%
2010 447
18.7%
2009 491
20.6%
2008 386
16.2%
2007 168
 
7.0%
2006 117
 
4.9%
2005 140
 
5.9%

용도
Categorical

IMBALANCE 

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
정곡
2361 
대북
 
27

Length

Max length2
Median length2
Mean length2
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 2361
98.9%
대북 27
 
1.1%

Length

2024-03-23T07:43:13.211335image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:13.379811image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 2361
98.9%
대북 27
 
1.1%

원산지
Categorical

Distinct8
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
국산
1275 
중국
455 
미국
415 
태국
232 
호주
 
5
Other values (3)
 
6

Length

Max length4
Median length2
Mean length2.001675
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row국산
2nd row미국
3rd row중국
4th row태국
5th row국산

Common Values

ValueCountFrequency (%)
국산 1275
53.4%
중국 455
 
19.1%
미국 415
 
17.4%
태국 232
 
9.7%
호주 5
 
0.2%
인도 3
 
0.1%
베트남 2
 
0.1%
파키스탄 1
 
< 0.1%

Length

2024-03-23T07:43:13.685538image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:13.910340image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국산 1275
53.4%
중국 455
 
19.1%
미국 415
 
17.4%
태국 232
 
9.7%
호주 5
 
0.2%
인도 3
 
0.1%
베트남 2
 
0.1%
파키스탄 1
 
< 0.1%
Distinct1969
Distinct (%)82.5%
Missing0
Missing (%)0.0%
Memory size18.8 KiB
2024-03-23T07:43:14.582862image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length11
Median length8
Mean length8.2039363
Min length3

Characters and Unicode

Total characters19591
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1808 ?
Unique (%)75.7%

Sample

1st row3,657,080
2nd row137,800
3rd row482,000
4th row15,000
5th row2,423,600
ValueCountFrequency (%)
100,000 27
 
1.1%
50,000 24
 
1.0%
200,000 20
 
0.8%
150,000 16
 
0.7%
30,000 15
 
0.6%
20,000 13
 
0.5%
120,000 13
 
0.5%
10,000 13
 
0.5%
80,000 10
 
0.4%
140,000 9
 
0.4%
Other values (1959) 2228
93.3%
2024-03-23T07:43:15.680951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/