Home
What is TS Corpus?
TS Corpus is a Turkish Corpus project. TS Corpus is a general-purpose corpus containing 491 million POSTagged tokens (491,360,398 million). TS Corpus is a tagged corpus. TS Corpus aims to combine former Turkish computational linguistics studies and other corpus linguistics studies from world and form a "usable" and "accesable" Turkish Corpus as an end-user product.
TS Corpus is the first PosTagged Turkish Corpus ever build. The first version of the corpus was published in March 2012 and the second version in August 2012. Studies on developing version 2.1 is still in progress.
TS Corpus Search Engine 
In order to crawl "big data" for next version of TS Corpus (TS Corpus version 3) a new infrastructure had set up on TS Corpus server. A part of this framework is a search engine. TS Corpus search engine uses Apache Nutch, Apache Solr, Apache Lucene and CrawlAnywhere.
TS Corpus Search Engine now has over 160 thousand web pages indexed and each hour this number is rapidly increasing. Click the link below for a test drive.
TS Corpus Search Engine
TS Corpus at a Glance
- TS Corpus is the largest Turkish Corpus ever build
- TS Corpus is the first Turkish corpus whose data is POStagged
- TS Corpus is the first Turkish corpus whose data is morphologically tagged
- TS Corpus is the first Turkish corpus that has tagged lemmas and that you can search by using lemmas
- TS Corpus is the first Turkish corpus that can be reached on-line
- TS Corpus is the first Turkish corpus that can present 7 different statistical information to the user
- TS Corpus is the first Turkish corpus that has been produced on CWB-CQP substructure
- TS Corpus is the first Turkish corpus that enables the users save the results in different file formats
- TS Corpus is the first Turkish corpus that has been presented freely available
TS Corpus links on Internet
By releasing TS Corpus version 2, some noteable websites had linked TS Corpus on their websites. Some are:
CWB (Corpus WorkBench) Official Website
Michigan University Library Database (USA)
North Carolina University (USA)
Washington University (USA)
George Mason University (USA)
Alphabit.net (Spain)
Wikipedia
New User Interface of TS Corpus
Dear TS Corpus users. TS Corpus has moved to its "new user interface". The new interface has new features like:
-- Highlighting lines at result screen
-- Classy display of Query History and Saved Queries screens
-- New Navigation bar with drop-down menus
-- Standard Queries are now triggered by Enter/Return key
-- etc.
The new interface also serves for small displays like smart phones or tablet pc's. The main TS Corpus page now fits on displays as small as 3.2 inches.
As the interface stands on CSS3 it's compatible with all modern browsers like Firefox, Opera, Safari, Iceweasel etc.
Hope you like it...
TS Corpus is now on New Server
Until today TS Corpus had published on a limited infastructure. Also due to power cuts or ISP problems TS Corpus had reached only 92% uptime.
Now, TS Corpus moved to a professional level. The corpus is now on OVH, the greatest data center in Europe. TS Corpus is now serving with a load-balanced database structure on a professional data-center server with 100Mbps internet connection and trustable, uninterrupted service.
With it's new professional infrastructure TS Corpus now can operate your queries 40% faster and serve results 5 times faster.
TS Corpus Turkish Frequency Lists
TS Corpus frequency lists have published. TS Corpus frequency lists include morphological analyses and possible root of the words for the first time in Turkish.
Click here for TS Corpus frequency lists.