Table of harshness scores of Enron emails generated by OASYS: harshness.enron(36MB).
Each row has a filename (followed by a colon), a score computed by the DWTF algorithm, a score by the Template_No_Topic algorithm, a score by the TF_No_Topic algorithm, and a hybrid score computed using the former 3 scores.  Personally I doubt the accuracy of the hybrid scores, but believe more or less the TF_No_Topic scores based on my spot check.  (Reference: Cesarano, Bonnie Dorr, Antonio Picariello, Diego Reforgiato, Amelia Sagoff, V.S. Subrahmanian (2006), OASYS: An Opinion Analysis System. AAAI-CAAW 2006, Palo Alto, CA.)
Table of <filename messageID> and table of  generated from Enron email corpus.
- Table of <messageID personNames>: enronMsgIDPersonNames.txt (52M).
    There are duplicated messageIDs in the table because multiple emails may use a same messageID.  That is, messageIDs 
    are not unique identifications of email messages; filenames are.
 - Here is the list of messageIDs that have duplicates and their corresponding filenames: 
    duplicateMsgIDTable.txt
 - Here is the table of <filename messageID> with duplicated messageIDs: 
    enronFilenameMsgIDTable.txt(35M).  
 - Here is the table of <filename messageID> without duplicated messageIDs:
    enronFilenameMsgIDTableNoDup.txt(35M). Duplicated messageIDs were removed
    according to the system's order of reading the enron corpus' "maildir" directory and processing its subdirectories. 
    That is, if a messageID has already been used by a message (identified by the system as a filename, such as 
    /allen-p/inbox/1.), the files (with the same messageID) read by the system later will be ignored.  Note, the system 
    does not necessarily read the "allen-p" subdirectory before the "martin-t" subdirectory.   
 
Lists mapping from email addresses to mentioned names. These are extracted from NameSearchAddress.out which is filtered down ONLY to cases in which there is a single unique email address for the menioned name
(i.e., not zero, and not more than one).
Enron email corpus annotated by LingPipe: Annotated Enron corpus (496M). 
Yejun Wu (wuyj AT glue DOT umd DOT edu) 
3/31/06