Table of harshness scores of Enron emails generated by OASYS: harshness.enron(36MB).
Each row has a filename (followed by a colon), a score computed by the DWTF algorithm, a score by the Template_No_Topic algorithm, a score by the TF_No_Topic algorithm, and a hybrid score computed using the former 3 scores. Personally I doubt the accuracy of the hybrid scores, but believe more or less the TF_No_Topic scores based on my spot check. (Reference: Cesarano, Bonnie Dorr, Antonio Picariello, Diego Reforgiato, Amelia Sagoff, V.S. Subrahmanian (2006), OASYS: An Opinion Analysis System. AAAI-CAAW 2006, Palo Alto, CA.)
Table of <filename messageID> and table of generated from Enron email corpus.
- Table of <messageID personNames>: enronMsgIDPersonNames.txt (52M).
There are duplicated messageIDs in the table because multiple emails may use a same messageID. That is, messageIDs
are not unique identifications of email messages; filenames are.
- Here is the list of messageIDs that have duplicates and their corresponding filenames:
duplicateMsgIDTable.txt
- Here is the table of <filename messageID> with duplicated messageIDs:
enronFilenameMsgIDTable.txt(35M).
- Here is the table of <filename messageID> without duplicated messageIDs:
enronFilenameMsgIDTableNoDup.txt(35M). Duplicated messageIDs were removed
according to the system's order of reading the enron corpus' "maildir" directory and processing its subdirectories.
That is, if a messageID has already been used by a message (identified by the system as a filename, such as
/allen-p/inbox/1.), the files (with the same messageID) read by the system later will be ignored. Note, the system
does not necessarily read the "allen-p" subdirectory before the "martin-t" subdirectory.
Lists mapping from email addresses to mentioned names. These are extracted from NameSearchAddress.out which is filtered down ONLY to cases in which there is a single unique email address for the menioned name
(i.e., not zero, and not more than one).
Enron email corpus annotated by LingPipe: Annotated Enron corpus (496M).
Yejun Wu (wuyj AT glue DOT umd DOT edu)
3/31/06