Overview

Dataset statistics

Number of variables6
Number of observations8449
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory420.9 KiB
Average record size in memory51.0 B

Variable types

Numeric3
Text1
Categorical2

Dataset

Description국립농산물품질관리원에서 관리하는 쌀 등 정곡에 대한 검사 실적 정보(신청년도, 시군, 연산, 용도, 원산지, 검사수량 등)
Author국립농산물품질관리원
URLhttps://data.mafra.go.kr/opendata/data/indexOpenDataDetail.do?data_id=20220204000000001690

Alerts

신청년도 is highly overall correlated with 연산High correlation
연산 is highly overall correlated with 신청년도High correlation
용도 is highly imbalanced (95.7%)Imbalance

Reproduction

Analysis started2024-03-23 07:42:57.737038
Analysis finished2024-03-23 07:43:01.611474
Duration3.87 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

신청년도
Real number (ℝ)

HIGH CORRELATION 

Distinct14
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.8919
Minimum2009
Maximum2022
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.4 KiB
2024-03-23T07:43:01.822956image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2010
Q12013
median2016
Q32019
95-th percentile2022
Maximum2022
Range13
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.7548633
Coefficient of variation (CV)0.0018626313
Kurtosis-1.164884
Mean2015.8919
Median Absolute Deviation (MAD)3
Skewness-0.023858296
Sum17032271
Variance14.098999
MonotonicityDecreasing
2024-03-23T07:43:02.215716image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
2019 737
 
8.7%
2015 720
 
8.5%
2014 720
 
8.5%
2018 669
 
7.9%
2010 648
 
7.7%
2011 628
 
7.4%
2016 627
 
7.4%
2021 626
 
7.4%
2020 603
 
7.1%
2017 598
 
7.1%
Other values (4) 1873
22.2%
ValueCountFrequency (%)
2009 116
 
1.4%
2010 648
7.7%
2011 628
7.4%
2012 587
6.9%
2013 586
6.9%
2014 720
8.5%
2015 720
8.5%
2016 627
7.4%
2017 598
7.1%
2018 669
7.9%
ValueCountFrequency (%)
2022 584
6.9%
2021 626
7.4%
2020 603
7.1%
2019 737
8.7%
2018 669
7.9%
2017 598
7.1%
2016 627
7.4%
2015 720
8.5%
2014 720
8.5%
2013 586
6.9%

시군
Text

Distinct100
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Memory size66.1 KiB
2024-03-23T07:43:02.820369image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length12
Median length8
Mean length8.1109007
Min length7

Characters and Unicode

Total characters68529
Distinct characters94
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row강원도 강릉시
2nd row강원도 강릉시
3rd row강원도 강릉시
4th row강원도 고성군
5th row강원도 고성군
ValueCountFrequency (%)
전라남도 1641
 
9.4%
경상북도 1540
 
8.8%
전라북도 967
 
5.5%
경기도 894
 
5.1%
경상남도 883
 
5.1%
충청남도 808
 
4.6%
충청북도 725
 
4.2%
강원도 624
 
3.6%
북구 152
 
0.9%
논산시 142
 
0.8%
Other values (108) 9063
52.0%
2024-03-23T07:43:03.711126image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
8995
 
13.1%
8138
 
11.9%
4381
 
6.4%
4190
 
6.1%
3655
 
5.3%
3564
 
5.2%
3384
 
4.9%
2761
 
4.0%
2608
 
3.8%
2535
 
3.7%
Other values (84) 24318
35.5%

Most occurring categories

ValueCountFrequency (%)
Other Letter 59534
86.9%
Space Separator 8995
 
13.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
8138
 
13.7%
4381
 
7.4%
4190
 
7.0%
3655
 
6.1%
3564
 
6.0%
3384
 
5.7%
2761
 
4.6%
2608
 
4.4%
2535
 
4.3%
1840
 
3.1%
Other values (83) 22478
37.8%
Space Separator
ValueCountFrequency (%)
8995
100.0%

Most occurring scripts

ValueCountFrequency (%)
Hangul 59534
86.9%
Common 8995
 
13.1%

Most frequent character per script

Hangul
ValueCountFrequency (%)
8138
 
13.7%
4381
 
7.4%
4190
 
7.0%
3655
 
6.1%
3564
 
6.0%
3384
 
5.7%
2761
 
4.6%
2608
 
4.4%
2535
 
4.3%
1840
 
3.1%
Other values (83) 22478
37.8%
Common
ValueCountFrequency (%)
8995
100.0%

Most occurring blocks

ValueCountFrequency (%)
Hangul 59534
86.9%
ASCII 8995
 
13.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
8995
100.0%
Hangul
ValueCountFrequency (%)
8138
 
13.7%
4381
 
7.4%
4190
 
7.0%
3655
 
6.1%
3564
 
6.0%
3384
 
5.7%
2761
 
4.6%
2608
 
4.4%
2535
 
4.3%
1840
 
3.1%
Other values (83) 22478
37.8%

연산
Real number (ℝ)

HIGH CORRELATION 

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2013.7492
Minimum2005
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size74.4 KiB
2024-03-23T07:43:03.942656image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2005
5-th percentile2007
Q12011
median2014
Q32017
95-th percentile2020
Maximum2021
Range16
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.0218031
Coefficient of variation (CV)0.0019971718
Kurtosis-0.9284781
Mean2013.7492
Median Absolute Deviation (MAD)3
Skewness-0.10491945
Sum17014167
Variance16.1749
MonotonicityNot monotonic
2024-03-23T07:43:04.162427image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
2016 722
 
8.5%
2018 710
 
8.4%
2014 682
 
8.1%
2012 680
 
8.0%
2011 651
 
7.7%
2013 640
 
7.6%
2009 617
 
7.3%
2015 585
 
6.9%
2010 534
 
6.3%
2017 532
 
6.3%
Other values (7) 2096
24.8%
ValueCountFrequency (%)
2005 145
 
1.7%
2006 118
 
1.4%
2007 178
 
2.1%
2008 460
5.4%
2009 617
7.3%
2010 534
6.3%
2011 651
7.7%
2012 680
8.0%
2013 640
7.6%
2014 682
8.1%
ValueCountFrequency (%)
2021 172
 
2.0%
2020 526
6.2%
2019 497
5.9%
2018 710
8.4%
2017 532
6.3%
2016 722
8.5%
2015 585
6.9%
2014 682
8.1%
2013 640
7.6%
2012 680
8.0%

용도
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size66.1 KiB
정곡
8389 
<NA>
 
31
대북
 
29

Length

Max length4
Median length2
Mean length2.0073381
Min length2

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row정곡
2nd row정곡
3rd row정곡
4th row정곡
5th row정곡

Common Values

ValueCountFrequency (%)
정곡 8389
99.3%
<NA> 31
 
0.4%
대북 29
 
0.3%

Length

2024-03-23T07:43:04.567372image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-03-23T07:43:04.912146image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
정곡 8389
99.3%
na 31
 
0.4%
대북 29
 
0.3%

원산지
Categorical

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size66.1 KiB
국산
3940 
중국
1573 
미국
1468 
태국
667 
베트남
 
336
Other values (12)
465 

Length

Max length8
Median length2
Mean length2.1767073
Min length2

Unique

Unique5 ?
Unique (%)0.1%

Sample

1st row국산
2nd row미국
3rd row중국
4th row국산
5th row중국

Common Values

ValueCountFrequency (%)
국산 3940
46.6%
중국 1573
 
18.6%
미국 1468
 
17.4%
태국 667
 
7.9%
베트남 336
 
4.0%
중국(미국) 262
 
3.1%
호주 142
 
1.7%
인도 34
 
0.4%
중국(태국) 14
 
0.2%
중국(호주) 4
 
< 0.1%
Other values (7) 9
 
0.1%

Length

2024-03-23T07:43:05.271573image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
국산 3940
46.6%
중국 1573
 
18.6%
미국 1468
 
17.4%
태국 667
 
7.9%
베트남 336
 
4.0%
중국(미국 262
 
3.1%
호주 142
 
1.7%
인도 34
 
0.4%
중국(태국 14
 
0.2%
중국(호주 4
 
< 0.1%
Other values (7) 9
 
0.1%

검사수량
Real number (ℝ)

Distinct5748
Distinct (%)68.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean927355.47
Minimum0
Maximum20756080
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size74.4 KiB
2024-03-23T07:43:05.700212image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10320
Q1110000
median358080
Q31083600
95-th percentile3620876
Maximum20756080
Range20756080
Interquartile range (IQR)973600

Descriptive statistics

Standard deviation1563524.3
Coefficient of variation (CV)1.6860032
Kurtosis30.14651
Mean927355.47
Median Absolute Deviation (MAD)308680
Skewness4.4371391
Sum7.8352263 × 109
Variance2.4446083 × 1012
MonotonicityNot monotonic
2024-03-23T07:43:06.359946image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000.0 113
 
1.3%
200000.0 73
 
0.9%
50000.0 73
 
0.9%
150000.0 47
 
0.6%
30000.0 44
 
0.5%
10000.0 43
 
0.5%
20000.0 39
 
0.5%
40000.0 38
 
0.4%
60000.0 34
 
0.4%
300000.0 34
 
0.4%
Other values (5738) 7911
93.6%
ValueCountFrequency (%)
0.0 1
 
< 0.1%
20.0 1
 
< 0.1%
40.0 4
< 0.1%
80.0 3
< 0.1%
120.0 1
 
< 0.1%
200.0 1
 
< 0.1%
320.0 3
< 0.1%
360.0 4
< 0.1%
400.0 4
< 0.1%
420.0 1
 
< 0.1%
ValueCountFrequency (%)
20756080.0 1
< 0.1%
20377000.0 1
< 0.1%
19925000.0 1
< 0.1%
18091280.0 1
< 0.1%
17080400.0 1
< 0.1%
16155000.0 1
< 0.1%
15599360.0 1
< 0.1%
15247240.0 1
< 0.1%
15087600.0 1
< 0.1%
15001000.0 1
< 0.1%

Interactions

2024-03-23T07:43:00.236935image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:58.710202image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:59.536170image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:00.511964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:58.977815image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:59.870892image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:00.758742image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:42:59.259633image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-03-23T07:43:00.061626image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/