Compositional NNSE data file
Load the file into matlab
load cnnse_lmbdc_0.500000_lmbdl_0.050000_all_data.mat
You should see the following variables
Name Size Bytes Class Attributes
A 49791x1000 398328000 double
B_mat 54454x54454 1087144 double sparse
B_mat_inds 54454x54454 1087144 double sparse
D 1000x1000 8000000 double
X 54454x2000 871264000 double
test_inds 1x4663 37304 double
train_inds 1x49791 398328 double
words 54454x1 7555970 cell
X is the original input data, test_inds and train_inds are the indices of that matrix used for training and testing the model. words lists the word or phrase for each row of X
A and D are the matrices learned by CNNSE. If you're looking for embeddings, they are in matrix A. B_mat is the matrix used during training, its non-zero elements are all 0.5 or 1. B_mat_inds is a matrix you can use to find the adjective noun and phrase rows for a particular phrase. For example line 40467 is the first adjective noun phrase in the matrix:
B_mat_inds(40467,:)
ans =
(1,30596) 3
(1,39138) 2
(1,40467) 1
This tells us that the phrase corresponds to row 40467, the adjective is row 39138 and the noun is row 30596
words([40467 39138 30596])
ans =
'volcanic/JJ_rocks/NNS'
'volcanic/JJ'
'rocks/NNS'
Email Alona with questions: amfyshe at_sign gmail.com