parent
59184662de
commit
9a9cfd5444
@ -0,0 +1,170 @@
|
|||||||
|
SUMMARY
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
These files contain 1,000,209 anonymous ratings of approximately 3,900 movies
|
||||||
|
made by 6,040 MovieLens users who joined MovieLens in 2000.
|
||||||
|
|
||||||
|
USAGE LICENSE
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
Neither the University of Minnesota nor any of the researchers
|
||||||
|
involved can guarantee the correctness of the data, its suitability
|
||||||
|
for any particular purpose, or the validity of results based on the
|
||||||
|
use of the data set. The data set may be used for any research
|
||||||
|
purposes under the following conditions:
|
||||||
|
|
||||||
|
* The user may not state or imply any endorsement from the
|
||||||
|
University of Minnesota or the GroupLens Research Group.
|
||||||
|
|
||||||
|
* The user must acknowledge the use of the data set in
|
||||||
|
publications resulting from the use of the data set
|
||||||
|
(see below for citation information).
|
||||||
|
|
||||||
|
* The user may not redistribute the data without separate
|
||||||
|
permission.
|
||||||
|
|
||||||
|
* The user may not use this information for any commercial or
|
||||||
|
revenue-bearing purposes without first obtaining permission
|
||||||
|
from a faculty member of the GroupLens Research Project at the
|
||||||
|
University of Minnesota.
|
||||||
|
|
||||||
|
If you have any further questions or comments, please contact GroupLens
|
||||||
|
<grouplens-info@cs.umn.edu>.
|
||||||
|
|
||||||
|
CITATION
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
To acknowledge use of the dataset in publications, please cite the following
|
||||||
|
paper:
|
||||||
|
|
||||||
|
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History
|
||||||
|
and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4,
|
||||||
|
Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872
|
||||||
|
|
||||||
|
|
||||||
|
ACKNOWLEDGEMENTS
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
Thanks to Shyong Lam and Jon Herlocker for cleaning up and generating the data
|
||||||
|
set.
|
||||||
|
|
||||||
|
FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
The GroupLens Research Project is a research group in the Department of
|
||||||
|
Computer Science and Engineering at the University of Minnesota. Members of
|
||||||
|
the GroupLens Research Project are involved in many research projects related
|
||||||
|
to the fields of information filtering, collaborative filtering, and
|
||||||
|
recommender systems. The project is lead by professors John Riedl and Joseph
|
||||||
|
Konstan. The project began to explore automated collaborative filtering in
|
||||||
|
1992, but is most well known for its world wide trial of an automated
|
||||||
|
collaborative filtering system for Usenet news in 1996. Since then the project
|
||||||
|
has expanded its scope to research overall information filtering solutions,
|
||||||
|
integrating in content-based methods as well as improving current collaborative
|
||||||
|
filtering technology.
|
||||||
|
|
||||||
|
Further information on the GroupLens Research project, including research
|
||||||
|
publications, can be found at the following web site:
|
||||||
|
|
||||||
|
http://www.grouplens.org/
|
||||||
|
|
||||||
|
GroupLens Research currently operates a movie recommender based on
|
||||||
|
collaborative filtering:
|
||||||
|
|
||||||
|
http://www.movielens.org/
|
||||||
|
|
||||||
|
RATINGS FILE DESCRIPTION
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
All ratings are contained in the file "ratings.dat" and are in the
|
||||||
|
following format:
|
||||||
|
|
||||||
|
UserID::MovieID::Rating::Timestamp
|
||||||
|
|
||||||
|
- UserIDs range between 1 and 6040
|
||||||
|
- MovieIDs range between 1 and 3952
|
||||||
|
- Ratings are made on a 5-star scale (whole-star ratings only)
|
||||||
|
- Timestamp is represented in seconds since the epoch as returned by time(2)
|
||||||
|
- Each user has at least 20 ratings
|
||||||
|
|
||||||
|
USERS FILE DESCRIPTION
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
User information is in the file "users.dat" and is in the following
|
||||||
|
format:
|
||||||
|
|
||||||
|
UserID::Gender::Age::Occupation::Zip-code
|
||||||
|
|
||||||
|
All demographic information is provided voluntarily by the users and is
|
||||||
|
not checked for accuracy. Only users who have provided some demographic
|
||||||
|
information are included in this data set.
|
||||||
|
|
||||||
|
- Gender is denoted by a "M" for male and "F" for female
|
||||||
|
- Age is chosen from the following ranges:
|
||||||
|
|
||||||
|
* 1: "Under 18"
|
||||||
|
* 18: "18-24"
|
||||||
|
* 25: "25-34"
|
||||||
|
* 35: "35-44"
|
||||||
|
* 45: "45-49"
|
||||||
|
* 50: "50-55"
|
||||||
|
* 56: "56+"
|
||||||
|
|
||||||
|
- Occupation is chosen from the following choices:
|
||||||
|
|
||||||
|
* 0: "other" or not specified
|
||||||
|
* 1: "academic/educator"
|
||||||
|
* 2: "artist"
|
||||||
|
* 3: "clerical/admin"
|
||||||
|
* 4: "college/grad student"
|
||||||
|
* 5: "customer service"
|
||||||
|
* 6: "doctor/health care"
|
||||||
|
* 7: "executive/managerial"
|
||||||
|
* 8: "farmer"
|
||||||
|
* 9: "homemaker"
|
||||||
|
* 10: "K-12 student"
|
||||||
|
* 11: "lawyer"
|
||||||
|
* 12: "programmer"
|
||||||
|
* 13: "retired"
|
||||||
|
* 14: "sales/marketing"
|
||||||
|
* 15: "scientist"
|
||||||
|
* 16: "self-employed"
|
||||||
|
* 17: "technician/engineer"
|
||||||
|
* 18: "tradesman/craftsman"
|
||||||
|
* 19: "unemployed"
|
||||||
|
* 20: "writer"
|
||||||
|
|
||||||
|
MOVIES FILE DESCRIPTION
|
||||||
|
================================================================================
|
||||||
|
|
||||||
|
Movie information is in the file "movies.dat" and is in the following
|
||||||
|
format:
|
||||||
|
|
||||||
|
MovieID::Title::Genres
|
||||||
|
|
||||||
|
- Titles are identical to titles provided by the IMDB (including
|
||||||
|
year of release)
|
||||||
|
- Genres are pipe-separated and are selected from the following genres:
|
||||||
|
|
||||||
|
* Action
|
||||||
|
* Adventure
|
||||||
|
* Animation
|
||||||
|
* Children's
|
||||||
|
* Comedy
|
||||||
|
* Crime
|
||||||
|
* Documentary
|
||||||
|
* Drama
|
||||||
|
* Fantasy
|
||||||
|
* Film-Noir
|
||||||
|
* Horror
|
||||||
|
* Musical
|
||||||
|
* Mystery
|
||||||
|
* Romance
|
||||||
|
* Sci-Fi
|
||||||
|
* Thriller
|
||||||
|
* War
|
||||||
|
* Western
|
||||||
|
|
||||||
|
- Some MovieIDs do not correspond to a movie due to accidental duplicate
|
||||||
|
entries and/or test entries
|
||||||
|
- Movies are mostly entered by hand, so errors and inconsistencies may exist
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,28 @@
|
|||||||
|
import torch
|
||||||
|
from torch.utils.data import DataLoader, Dataset
|
||||||
|
|
||||||
|
class UserItemRatingDataset(Dataset):
|
||||||
|
"""
|
||||||
|
Wrapper, convert input <user, item, rating> Tensor into torch Dataset
|
||||||
|
"""
|
||||||
|
def __init__(self, user_tensor, item_tensor, target_tensor):
|
||||||
|
"""
|
||||||
|
args:
|
||||||
|
target_tensor: torch.Tensor, the corresponding rating for <user, item> pair
|
||||||
|
"""
|
||||||
|
self._user_tensor = user_tensor
|
||||||
|
self._item_tensor = item_tensor
|
||||||
|
self._target_tensor = target_tensor
|
||||||
|
|
||||||
|
def __getitem__(self, index):
|
||||||
|
return self._user_tensor[index], self._item_tensor[index], self._target_tensor[index]
|
||||||
|
|
||||||
|
def __len__(self):
|
||||||
|
return self._user_tensor.size(0)
|
||||||
|
|
||||||
|
def Construct_DataLoader(users, items, ratings, batchsize):
|
||||||
|
assert batchsize > 0
|
||||||
|
dataset = UserItemRatingDataset(user_tensor=torch.LongTensor(users),
|
||||||
|
item_tensor=torch.LongTensor(items),
|
||||||
|
target_tensor=torch.LongTensor(ratings))
|
||||||
|
return DataLoader(dataset, batch_size=batchsize, shuffle=True)
|
Binary file not shown.
Binary file not shown.
Binary file not shown.
@ -0,0 +1,120 @@
|
|||||||
|
import sys
|
||||||
|
import os.path as osp
|
||||||
|
this_dir = osp.dirname(__file__)
|
||||||
|
lib_path = osp.join(this_dir, '..')
|
||||||
|
sys.path.insert(0, lib_path)
|
||||||
|
from NCF.dataprocess import DataProcess
|
||||||
|
from NCF.network import GMF,MLP,NeuMF
|
||||||
|
from NCF.trainer import Trainer
|
||||||
|
import numpy as np
|
||||||
|
import torch
|
||||||
|
|
||||||
|
gmf_config = {'num_epoch': 100,
|
||||||
|
'batch_size': 1024,
|
||||||
|
'optimizer': 'adam',
|
||||||
|
'adam_lr': 1e-3,
|
||||||
|
'num_users': 6040,
|
||||||
|
'num_items': 3706,
|
||||||
|
'latent_dim_gmf': 8,
|
||||||
|
'num_negative': 4,
|
||||||
|
'layers': [],
|
||||||
|
'l2_regularization': 0, # 0.01
|
||||||
|
'pretrain': False, # do not modify this
|
||||||
|
'use_cuda': True,
|
||||||
|
'device_id': 0,
|
||||||
|
'model_name': '../TrainedModels/NCF_GMF.model'
|
||||||
|
}
|
||||||
|
|
||||||
|
mlp_config = {'num_epoch': 100,
|
||||||
|
'batch_size': 1024, # 1024,
|
||||||
|
'optimizer': 'adam',
|
||||||
|
'adam_lr': 1e-3,
|
||||||
|
'num_users': 6040,
|
||||||
|
'num_items': 3706,
|
||||||
|
'latent_dim_mlp': 8,
|
||||||
|
'latent_dim_gmf': 8,
|
||||||
|
'num_negative': 4,
|
||||||
|
'layers': [16,64,32,16,8], # layers[0] is the concat of latent user vector & latent item vector
|
||||||
|
'l2_regularization': 0.0000001, # MLP model is sensitive to hyper params
|
||||||
|
'use_cuda': True,
|
||||||
|
'device_id': 0,
|
||||||
|
'pretrain': True,
|
||||||
|
'gmf_config': gmf_config,
|
||||||
|
'pretrain_gmf': '../TrainedModels/NCF_GMF.model',
|
||||||
|
'model_name': '../TrainedModels/NCF_MLP.model'
|
||||||
|
}
|
||||||
|
|
||||||
|
neumf_config = {'num_epoch': 100,
|
||||||
|
'batch_size': 1024, #1024
|
||||||
|
'optimizer': 'adam',
|
||||||
|
'adam_lr': 1e-3,
|
||||||
|
'num_users': 6040,
|
||||||
|
'num_items': 3706,
|
||||||
|
'latent_dim_gmf': 8,
|
||||||
|
'latent_dim_mlp': 8,
|
||||||
|
'num_negative': 4,
|
||||||
|
'layers': [16,32,16,8], # layers[0] 是用户和物品隐层表示concat的维度
|
||||||
|
'l2_regularization': 0.01,
|
||||||
|
'alpha': 0.5, # 用于控制GMF和MLP模型参数的权重
|
||||||
|
'use_cuda': True,
|
||||||
|
'device_id': 0,
|
||||||
|
'pretrain': False,
|
||||||
|
'gmf_config': gmf_config,
|
||||||
|
'pretrain_gmf': '../TrainedModels/NCF_GMF.model',
|
||||||
|
'mlp_config': mlp_config,
|
||||||
|
'pretrain_mlp': '../TrainedModels/NCF_MLP.model',
|
||||||
|
'model_name': '../TrainedModels/NCF_NeuMF.model'
|
||||||
|
}
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
####################################################################################
|
||||||
|
# NCF 神经协同过滤算法
|
||||||
|
####################################################################################
|
||||||
|
|
||||||
|
# 加载和预处理数据
|
||||||
|
dp = DataProcess("../Data/ml-1m/ratings.dat")
|
||||||
|
|
||||||
|
# 初始化GMP模型
|
||||||
|
config = gmf_config
|
||||||
|
model = GMF(config, config['latent_dim_gmf'])
|
||||||
|
|
||||||
|
# # 初始化MLP模型
|
||||||
|
config = mlp_config
|
||||||
|
model = MLP(config, config['latent_dim_mlp'])
|
||||||
|
|
||||||
|
# 初始化NeuMF模型
|
||||||
|
config = neumf_config
|
||||||
|
model = NeuMF(config, config['latent_dim_gmf'], config['latent_dim_mlp'])
|
||||||
|
|
||||||
|
# ###############################################################
|
||||||
|
# 模型训练阶段
|
||||||
|
# ###############################################################
|
||||||
|
trainer = Trainer(model=model, config=config)
|
||||||
|
trainer.train(dp.sample_generator)
|
||||||
|
trainer.save()
|
||||||
|
|
||||||
|
# ###############################################################
|
||||||
|
# 模型测试阶段
|
||||||
|
# ###############################################################
|
||||||
|
|
||||||
|
# 加载数据集
|
||||||
|
# dp = DataProcess("../Data/ml-1m/ratings.dat")
|
||||||
|
|
||||||
|
config = neumf_config
|
||||||
|
neumf = NeuMF(config, config['latent_dim_gmf'], config['latent_dim_mlp'])
|
||||||
|
state_dict = torch.load("../TrainedModels/NCF_NeuMF.model", map_location=torch.device('cpu'))
|
||||||
|
neumf.load_state_dict(state_dict, strict=False)
|
||||||
|
|
||||||
|
# 对用户User_id喜好度进行预测
|
||||||
|
User_id = 1
|
||||||
|
result = np.zeros((3706))
|
||||||
|
for j in range(3706):
|
||||||
|
socre = neumf.forward(torch.LongTensor([User_id]), torch.LongTensor([j]))
|
||||||
|
score = socre.detach().numpy()
|
||||||
|
result[j] = socre[0][0]
|
||||||
|
# 选取User_id喜好度最高的N个电影id进行推荐
|
||||||
|
N = 5
|
||||||
|
indexs = np.argsort(-result)[:N]
|
||||||
|
print(indexs)
|
||||||
|
|
||||||
|
|
Loading…
Reference in new issue