I am doing some research in concept extraction from different types of texts or genres.
I am looking for free research corpora belonging to the following genres:
1) FAQs (I have already downloaded some small collections, but I would like to have a more comprehensive range of topics).
2) Chat logs transcripts (I have already downloaded the NPS Collection, 3 Codiac datasets and several smallish Many Eyes datasets)
3) Telephone conversation transcripts (missing)
4) Emails (I have already downloaded the Enron dataset and a couple of junk mail collections)
5) Tweets (missing, apparently the Edinburgh’s Twitter corpus is not available any more)
6) Corporate weblogs (missing)
I will be glad to share all the links and related documentation, once I got all the genres in the list.
Thanks in advance for your suggestions.