Professor Martin Weisser, who teaches at our university, gave a lecture entitled "Advanced corpus linguistics through simple XML (Tools)" in the National Key Research Center for Linguistics & Applied Linguistics, Guangdong University of Foreign Studies, on June 12, 2014.
Prof. Weisser first introduced XML (extensible Markup Language), which highlights linguistic information and shares similarities with SQL database, SGML and HTML, though SGML and HTML have different functionalities. XML tagging consists of "head tag" and "tail tag" and tags are pair of brackets. The head tag can include attribute names such as ID and attribute values. One advantages of XML is customization, which can meet various linguistic highlighting. Current tools of corpus annotation and transformation are quite hard to learn and prone to err. Therefore, Prof. Weisser proposed a "simple XML" approach. It has the following benefits: (1) uses fewer nested elements; (2) relegates more information to attributes; (3) avoids excessive meta-data in headers; (4) keeps enclosing tags and text separate; (5) improves readability; and (6) improves editability. Then Prof. Weisser briefly introduced seven corpus software, available freely in his personal website. They involve "XML tagging and concordancing", "speech act annotation and research", "text feature extraction", and "phonetic transcription".
Prof. Martin Weisser's lecture provides some guide for corpus pragmatics.