原始数据统计信息

概要:原始数据采集自美团美食搜索结果,搜索关键词为 奶茶,搜索城市为全国范围内的直辖市、一线城市、省会城市,每个城市取搜索结果中前 30 页约 1000 条店铺信息。

Sheet I:各个店铺推荐的热销商品信息


 Dataset info

Number of variables6
Number of observations105705
Missing cells0 (0.0%)
Duplicate rows4024 (3.8%)
Total size4.8 MB
Average size48.0 B

 Variables types

Numeric3
Categorical2
Boolean0
Date0
URL0
Text (Unique)0
Rejected1

 Variables Description

NameDescriptionStdMeanMedian
title商品标题(字符)——————
price标价(数值)73.2132.5114.00
value原价(数值)107.4446.0518.00
sales销量(数值)3.16e+51.18e+55.40e+3
shop_id与表二对应的商店ID(编号)——————
city所属城市(字符)——————

 Warnings

Dataset has 4024 (3.8%) duplicate rowsWarning
price is highly skewed (γ1 = 34.17525054)Skewed
sales has 3092 (2.9%) zerosZeros
title has a high cardinality: 20916 distinct valuesWarning
value is highly correlated with price (ρ = 0.9631010987)Rejected

Sheet Ⅱ:各个城市的店铺信息


 Dataset info

Number of variables14
Number of observations30145
Missing cells2039 (0.5%)
Duplicate rows2218 (7.4%)
Total size3.0 MB
Average size105.0 B

 Variables types

Numeric6
Categorical7
Boolean1
Date0
URL0
Text (Unique)0
Rejected0

 Variables Description

NameDescriptionStdMeanMedian
id商店ID(编号)——————
title商店名称(字符)——————
address地址(字符)——————
avgprice产品均价(数值)47.1918.3514.00
latitude纬度(数值)6.8033.0531.95
longitude经度(数值)8.32112.35113.64
avgscore评分均值(数值)1.393.794.00
comments评论数量(数值)167535822
backCateName店铺分类(字符)——————
areaname所在区域(字符)——————
phone联系电话——————
region行政区域(字符)——————
isMain是否主营茶饮(逻辑)——69.2%——
city所在城市(字符)——————

 Warnings

Dataset has 2218 (7.4%) duplicate rowsWarning
address has a high cardinality: 27651 distinct valuesWarning
areaname has a high cardinality: 3280 distinct valuesWarning
avgprice is highly skewed (γ1 = 94.30320468)Skewed
avgprice has 5963 (19.8%) zerosZeros
avgscore has 3063 (10.2%) zerosZeros
backCateName has a high cardinality: 220 distinct valuesWarning
comments has 9078 (30.1%) zerosZeros
phone has a high cardinality: 25710 distinct valuesWarning
phone has 2039 (6.8%) missing valuesMissing
region has a high cardinality: 2116 distinct valuesWarning
title has a high cardinality: 9027 distinct valuesWarning