To analyse each concept, the HE Team works with linguists and humanitarian experts to understand how the concept is used in the collection of documents – the HE Corpus -, and to compare this with the experiences of local humanitarian actors on the ground.
This analysis involves multiple components:
- Natural language processing, supported by Sketch Engine software, is used to identify patterns in the text, including:
- frequencies by organisation type, region, document type and year
- definitions and elements of a definition
- related concepts
- nearby terms (co-locations) that qualify the usage of a concept
- synonyms and antonyms of the concept
- usage over time, trends, debates and controversies
- Initial findings of the textual analysis are visualised, summarised and explained in the concept overview page.
- Detailed findings are compiled and explained in a full concept entry page
- Further analysis investigates key debates around the concept’s use in humanitarian action
- Local humanitarian actors share their perspectives and exchange with experts on their experience of the concept in practice
About the HE “Corpus” Data: A collection of humanitarian documents
The HE “Corpus” is a collection of 4,824 documents, amounting to a total of 71,201,157 distinct words. Each document contains metadata, which allows us to divide the documents by organisation type, region, year of publication, and document type.
Organisation Types and Subtypes
Breaking down the document collection by organisations shows how different parts of the humanitarian community use the same concepts in different ways. The collection consists of documents produced by different humanitarian organisations, categorised into 11 types and 26 subtypes.
The following interactive chart shows the composition of the document collection by organisation type and subtype.
Click or hover on a wedge to see more information and the subtypes for each each type of organisation.
Comparison across regions shows how a concept is used in documents from different parts of the world. Documents are classified into seven regions: Africa, Asia, CCSA (Caribbean, Central and South America) MENA (Middle East and North Africa), North America and Oceania.
The following map shows the distribution of words in the document collection by the region where the publishing organisation is based.
Click on a bubble to see the distribution of words per region. You can also filter by organisation type.
Document Types & Year of Publication
All documents in the collection are classified into three categories: General Document, Activity Report and Strategy. Most documents are General Documents, followed by Activity Reports. In addition, the year of publication is attached to each document, which can show when a concept emerged or how it evolved over time.
The following histogram shows the number of documents for each year (2005-2019) in the collection, broken down by document type.
Click or hover on a bar to see the number of words per document type for that year.