Overview

Dataset statistics

Number of variables7
Number of observations8831
Missing cells120
Missing cells (%)0.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory517.6 KiB
Average record size in memory60.0 B

Variable types

Numeric4
Text1
Categorical2

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

신청년도 is highly overall correlated with 연산High correlation
연산 is highly overall correlated with 신청년도High correlation
용도 is highly imbalanced (95.9%)Imbalance
연산 has 120 (1.4%) missing valuesMissing

Reproduction

Analysis started2024-03-23 07:43:39.420861
Analysis finished2024-03-23 07:43:45.241771
Duration5.82 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.718
Minimum2009
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:45.414603image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12012
median2016
Q32019
95-th percentile2022
Maximum2022
Range13
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.9734954
Coefficient of variation (CV)0.0019712556
Kurtosis-1.1671541
Mean2015.718
Median Absolute Deviation (MAD)3
Skewness-0.058006551
Sum17800806
Variance15.788666
MonotonicityDecreasing
2024-03-23T07:43:46.058259image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2019 737
 
8.3%
2015 716
 
8.1%
2014 705
 
8.0%
2022 694
 
7.9%
2018 670
 
7.6%
2016 627
 
7.1%
2021 625
 
7.1%
2020 603
 
6.8%
2017 600
 
6.8%
2011 599
 
6.8%
Other values (4) 2255
25.5%
ValueCountFrequency (%)
2009 535
6.1%
2010 588
6.7%
2011 599
6.8%
2012 552
6.3%
2013 580
6.6%
2014 705
8.0%
2015 716
8.1%
2016 627
7.1%
2017 600
6.8%
2018 670
7.6%
ValueCountFrequency (%)
2022 694
7.9%
2021 625
7.1%
2020 603
6.8%
2019 737
8.3%
2018 670
7.6%
2017 600
6.8%
2016 627
7.1%
2015 716
8.1%
2014 705
8.0%
2013 580
6.6%

시군
Text

Distinct100
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size69.1 KiB
2024-03-23T07:43:46.646506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.1174272
Min length7

Characters and Unicode

Total characters71685
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row강원도 강릉시
2nd row강원도 강릉시
3rd row강원도 강릉시
4th row강원도 강릉시
5th row강원도 고성군
ValueCountFrequency (%)
전라남도 1689
 
9.3%
경상북도 1611
 
8.8%
전라북도 1019
 
5.6%
경상남도 935
 
5.1%
경기도 921
 
5.1%
충청남도 849
 
4.7%
충청북도 758
 
4.2%
강원도 644
 
3.5%
북구 160
 
0.9%
논산시 151
 
0.8%
Other values (108) 9495
52.1%
2024-03-23T07:43:47.695064image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
9406
 
13.1%
8483
 
11.8%
4595
 
6.4%
4356
 
6.1%
3808
 
5.3%
3722
 
5.2%
3548
 
4.9%
2874
 
4.0%
2708
 
3.8%
2666
 
3.7%
Other values (84) 25519
35.6%

Most occurring categories

ValueCountFrequency (%)
Other Letter 62279
86.9%
Space Separator 9406
 
13.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8483
 
13.6%
4595
 
7.4%
4356
 
7.0%
3808
 
6.1%
3722
 
6.0%
3548
 
5.7%
2874
 
4.6%
2708
 
4.3%
2666
 
4.3%
1930
 
3.1%
Other values (83) 23589
37.9%
Space Separator
ValueCountFrequency (%)
9406
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 62279
86.9%
Common 9406
 
13.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8483
 
13.6%
4595
 
7.4%
4356
 
7.0%
3808
 
6.1%
3722
 
6.0%
3548
 
5.7%
2874
 
4.6%
2708
 
4.3%
2666
 
4.3%
1930
 
3.1%
Other values (83) 23589
37.9%
Common
ValueCountFrequency (%)
9406
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 62279
86.9%
ASCII 9406
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9406
100.0%
Hangul
ValueCountFrequency (%)
8483
 
13.6%
4595
 
7.4%
4356
 
7.0%
3808
 
6.1%
3722
 
6.0%
3548
 
5.7%
2874
 
4.6%
2708
 
4.3%
2666
 
4.3%
1930
 
3.1%
Other values (83) 23589
37.9%

시군구코드
Real number (ℝ)

Distinct46
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean502.24018
Minimum110
Maximum900
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:47.985755image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum110
5-th percentile113
Q1170
median610
Q3790
95-th percentile870
Maximum900
Range790
Interquartile range (IQR)620

Descriptive statistics

Standard deviation302.20918
Coefficient of variation (CV)0.60172244
Kurtosis-1.8130431
Mean502.24018
Median Absolute Deviation (MAD)260
Skewness-0.12875996
Sum4435283
Variance91330.391
MonotonicityNot monotonic
2024-03-23T07:43:48.418625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=46)
ValueCountFrequency (%)
150 531
 
6.0%
720 439
 
5.0%
230 423
 
4.8%
130 420
 
4.8%
770 412
 
4.7%
170 366
 
4.1%
820 360
 
4.1%
113 319
 
3.6%
210 292
 
3.3%
730 281
 
3.2%
Other values (36) 4988
56.5%
ValueCountFrequency (%)
110 157
 
1.8%
113 319
3.6%
121 132
 
1.5%
130 420
4.8%
131 123
 
1.4%
140 176
 
2.0%
150 531
6.0%
170 366
4.1%
171 1
 
< 0.1%
180 86
 
1.0%
ValueCountFrequency (%)
900 134
 
1.5%
890 74
 
0.8%
880 98
 
1.1%
870 214
2.4%
860 147
1.7%
850 69
 
0.8%
840 226
2.6%
830 203
2.3%
820 360
4.1%
810 235
2.7%

연산
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct19
Distinct (%)0.2%
Missing120
Missing (%)1.4%
Infinite0
Infinite (%)0.0%
Mean2013.6799
Minimum2004
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:48.809441image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2004
5-th percentile2007
Q12010
median2014
Q32017
95-th percentile2020
Maximum2022
Range18
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.1684905
Coefficient of variation (CV)0.0020700859
Kurtosis-0.96991186
Mean2013.6799
Median Absolute Deviation (MAD)3
Skewness-0.11124217
Sum17541166
Variance17.376313
MonotonicityNot monotonic
2024-03-23T07:43:49.104451image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
2016 725
 
8.2%
2018 711
 
8.1%
2014 680
 
7.7%
2012 673
 
7.6%
2011 635
 
7.2%
2013 634
 
7.2%
2008 611
 
6.9%
2015 586
 
6.6%
2009 585
 
6.6%
2020 539
 
6.1%
Other values (9) 2332
26.4%
ValueCountFrequency (%)
2004 3
 
< 0.1%
2005 189
 
2.1%
2006 132
 
1.5%
2007 224
 
2.5%
2008 611
6.9%
2009 585
6.6%
2010 488
5.5%
2011 635
7.2%
2012 673
7.6%
2013 634
7.2%
ValueCountFrequency (%)
2022 1
 
< 0.1%
2021 231
 
2.6%
2020 539
6.1%
2019 532
6.0%
2018 711
8.1%
2017 532
6.0%
2016 725
8.2%
2015 586
6.6%
2014 680
7.7%
2013 634
7.2%

용도
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size69.1 KiB
정곡
8771 
<NA>
 
31
대북
 
29

Length

Max length4
Median length2
Mean length2.0070207
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 8771
99.3%
<NA> 31
 
0.4%
대북 29
 
0.3%

Length

2024-03-23T07:43:49.497282image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:49.847822image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 8771
99.3%
na 31
 
0.4%
대북 29
 
0.3%

원산지
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size69.1 KiB
국산
4102 
중국
1831 
미국
1592 
태국
734 
베트남
 
343
Other values (4)
 
229

Length

Max length4
Median length2
Mean length2.0483524
Min length2

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row중국
2nd row미국
3rd row중국
4th row국산
5th row국산

Common Values

ValueCountFrequency (%)
국산 4102
46.5%
중국 1831
20.7%
미국 1592
 
18.0%
태국 734
 
8.3%
베트남 343
 
3.9%
호주 152
 
1.7%
<NA> 41
 
0.5%
인도 35
 
0.4%
파키스탄 1
 
< 0.1%

Length

2024-03-23T07:43:50.238222image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:50.701427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
국산 4102
46.5%
중국 1831
20.7%
미국 1592
 
18.0%
태국 734
 
8.3%
베트남 343
 
3.9%
호주 152
 
1.7%
na 41
 
0.5%
인도 35
 
0.4%
파키스탄 1
 
< 0.1%

검사수량
Real number (ℝ)

Distinct6065
Distinct (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean946793.4
Minimum0
Maximum20756080
Zeros2
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size77.7 KiB
2024-03-23T07:43:51.246105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11060
Q1115160
median375600
Q31120235
95-th percentile3649675
Maximum20756080
Range20756080
Interquartile range (IQR)1005075

Descriptive statistics

Standard deviation1560749
Coefficient of variation (CV)1.6484579
Kurtosis29.379719
Mean946793.4
Median Absolute Deviation (MAD)324720
Skewness4.349752
Sum8.3611325 × 109
Variance2.4359376 × 1012
MonotonicityNot monotonic
2024-03-23T07:43:51.725447image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000.0 116
 
1.3%
200000.0 79
 
0.9%
50000.0 65
 
0.7%
150000.0 48
 
0.5%
10000.0 44
 
0.5%
30000.0 42
 
0.5%
20000.0 37
 
0.4%
60000.0 37
 
0.4%
80000.0 36
 
0.4%
40000.0 35
 
0.4%
Other values (6055) 8292
93.9%
ValueCountFrequency (%)
0.0 2
< 0.1%
20.0 1
 
< 0.1%
40.0 4
< 0.1%
80.0 3
< 0.1%
120.0 1
 
< 0.1%
160.0 1
 
< 0.1%
200.0 1
 
< 0.1%
320.0 3
< 0.1%
360.0 4
< 0.1%
400.0 4
< 0.1%
ValueCountFrequency (%)
20756080.0 1
< 0.1%
20447000.0 1
< 0.1%
20075000.0 1
< 0.1%
18161280.0 1
< 0.1%
17180400.0 1
< 0.1%
16235000.0 1
< 0.1%
15599360.0 1
< 0.1%
15247240.0 1
< 0.1%
15087600.0 1
< 0.1%
15001000.0 1
< 0.1%

Interactions

2024-03-23T07:43:44.020667image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:40.769225image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:41.875384image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:43.023881image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:44.324551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:41.033971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:42.160242image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/