Text and data mining (TDM)
The term "text and data mining" (TDM) refers to processes of automated extraction of information from large quantities of texts or data (corpora). Information can be derived from unstructured or weakly structured text data (text mining) or from strucured data (data mining).
Legal information
Use of the resources and access to them is subject to various legal and technical terms of use. If you are planning to analyse content from resources licensed by the library in the course of your research, please note that automated mass downloading of full texts or other information using a crawler, script, bot or similar methods is not permitted and can lead to access being blocked.
However, many content providers enable access via special interfaces (APIs). The licensed content can be used in TDM projects for scientific (non-commercial) purposes. However, each provider's consent to your specific TDM project needs to be obtained in advance. Contact information for this purpose can be found on the linked websites.
Data Sources
On this page, you will find an overview of resources for text and data mining. If you need organizational support for data access, contact us by e-mail.
Licensed content can be used for TDM for scientific purposes.
Provider | Content | Notes on usage |
AAAS - American Association for the Advancement of Science | AAAS publishes six peer-reviewed journals. TU Wien has a subscription for Science and Science Robotics. | No API available |
American Chemical Society (ACS) | ACS Publications publishes a range of journals covering all aspects of chemical sciences and related fields. ACS Publications platform, opens an external URL in a new window | TDM information ACS, opens an external URL in a new window No API. Local TDM agreement required |
Cambridge University Press | Cambridge University Press publishes more than 420 journals covering subjects across the humanities and social sciences as well as science, technology and medicine. Cambridge Core platform, opens an external URL in a new window | TDM information - CUP, opens an external URL in a new window No API available
|
Elsevier | Elsevier publishes over 2300 journals from the physical sciences and engineering, life sciences, social sciences and humanities, and health. ScienceDirect platform, opens an external URL in a new window | TDM information - Elsevier, opens an external URL in a new window Access via Elsevier API or via CrossRef TDM API |
Emerald | Emerald pubishes journals from a wide range of fields, including engineering, applied sciences and technology, management, and library and information sciences. Emerald Insights platform, opens an external URL in a new window | TDM information - Emerald, opens an external URL in a new window No API available |
JSTOR Labs | JSTOR hosts over 2800 scholarly journals from the humanities, social sciences, and sciences. JSTOR works with a diverse group of nearly 1200 publishers from more than 57 countries to preserve and make their content digitally available. | JSTOR Labs, opens an external URL in a new window Various APIs and open source projects available |
Oxford University Press | Oxford University Press publishes over 500 peer-reviewed academic journals with learned societies from all disciplines, including science and mathematics, the arts and humanities, the social sciences and medicine and health. Oxford Academic platform, opens an external URL in a new window | TDM informatin - OUP, opens an external URL in a new window No API available |
Royal Society of Chemistry | The Royal Society of Chemistry publishes 52 journals covering the chemical sciences and related fields. | TDM information - RSC, opens an external URL in a new window No API. Local TDM agreement required |
SAGE | TU Wien Bibliothek licenses around 25 SAGE journals from the disciplines of spatial planning, mechanical engineering and computer science. | TDM information - SAGE, opens an external URL in a new window Access via CrossRef TDM API |
Springer Nature | Springer publishes over 2900 journals from the fields of science, technology, and medicine (STM) and from the humanities. SpringerLink platform, opens an external URL in a new window | TDM information - Springer Nature, opens an external URL in a new window Access via Springer API. Local TDM agreement concluded for licensed journals and Lecture Notes. |
Taylor & Francis | Over 2700 peer-reviewed journals from a wide range of disciplines, opens an external URL in a new window. Explore Taylor & Francis journals, opens an external URL in a new window | TDM information - Taylor & Francis, opens an external URL in a new window No API available |
Wiley | Wiley offers a portfolio of 1600 journals from the life, health and physical sciences, social science and the humanities. Half of these journals are published in partnership with prestigious international scholarly and professional societies. | TDM information - Wiley, opens an external URL in a new window Local TDM agreement concluded for licensed journals. Access requires an ORCID iD and is carried out via CrossRef API. |
Provider | Content |
Preprint collection from the fields of physics, mathematics, computer science, electrical engineering, statistics, financial mathematics and biology | |
Around 300 open access journals from the disciplines biology and medicine | |
CORE is the world's largest aggregator of open access research papers from repositories and journals. | |
Crossref text and data mining, opens an external URL in a new window | Full-text documents from participating publishers regardless of the publishing model (both open access and subscription content) User guides are available on crossref.org, opens an external URL in a new window |
Digital library of European cultural heritage material including digitised books, films, museum and archive collections from over 2000 European insitutions | |
HathiTrust Digital Library, opens an external URL in a new window | Datasets from Internet Archive and Google Books and local digitised items from over 120 academic institutions worldwide |
Over 2 million freely downloadable books and other texts | |
Public Library of Science (PLOS), opens an external URL in a new window | Access to the journals of the Public Library of Science, a nonprofit academic open access publisher |
PubMed Central: Databases and Text Mining Tools, opens an external URL in a new window | Various freely downloadable mining tools for searching PubMed Central, a free resource containing content from the fields of life and biomedical sciences |
Structured data from Wikipedia and other free knowledge databases |
- Content Mining: Free Corpora for mining (University of Southern California Libraries)
- Text mining & text analysis > Open sources (The University of Queensland Library)