This is a description of the categories used in LeiLanD and their corresponding categories in the CMDI profile.

Field in LeiLand

Description

Corresponding CMDI category

Annotation type

type of annotations, understood as any enrichment of the data, e.g. specification of linguistic phenomenon, transcriptions, translations, glossing, POStagging, etc)

annotation:type

Author

the creator of the dataset

ResourceCreator

Availability

refers to whether the dataset is openly accessible or not

Open: It can be accessed and re-used.
If there is no link directly to the dataset, it has most likely not been published yet. This means that the author of the dataset is willing to share the dataset. Get in touch with the LeiLanD team to enquire about the accessibility of the dataset.  

Restricted: Some kind of conditions have to be met before accessing the dataset, or only some parts of the dataset can be accessed. Get in touch with the LeiLanD team to enquire about the accessibility of the dataset.

 

access:availability

Contact Person

the contact person for the dataset

access:ContactPerson

Corpus type

refers to the linguality of the dataset, for example whether the dataset is monolingual, bilingual, or multilingual.

lingualityType

Data format

the format in which the data is provided

Media

Data type

the kind of data in the dataset.

Audio: the data is in audio format

Audiovisual: the data contains videos

Photo: the data contains images

Written : the data contains text

DCType

Domain

provides information about the domain of research and disciplines associated with the dataset.

Domain

Gender

refers to the gender of the participants in the dataset

None

Language

the language(s) that make up the dataset

Language

Proficiency of the speakers

refers to the order of acquisition of a language.
L1: first language

L2: second language

L3:  third language

Other:  used for cases such as Artificial languages

lingualityNativeness

Location

place where the data was collected

geographicalProvenance

Metadata

whether the dataset comes with additional metadata

None

Modality

modality of the data.

Signed:  the data is made up of signed language

Written:  the data is made up of written text

Spoken: the data is made up of spoken utterances.

Modality

Persistent Identifier

the persistent identifier in the CMDI profile of the RU Collection Bank: https://applejack.science.ru.nl/collbank  

None

Publisher

this field provides the url of the archive, server and/ or website hosting the dataset

access:website

Software

whether software was used or developed to work with the dataset

None

Years

years during which the data collection took place

temporalProvenance