Syntax jth1

The purpose is to transfer the information contained in the original acts of civil registries to files that can be exploited by programs. This is generally done using GEDCOM format, which permits to store informations about persons and their family relationships.
To build my genealogical tree, I developed an alternative syntax, compatible with GEDCOM, but permitting to handle more information.
The status of this syntax is "operational draft". This means that there is no formal definition of the syntax ; the precise rules are fixed by the implementation of the program which analyzes the file to build structured data. This implementation is part of Jetheme program (not published yet). The plan is to extract the code of jth1 syntax from jetheme's code to make an autonom program.

Jth1 uses yaml syntax, which is both easy to type by a human and understandable by a program.


Here is an example of a person expressed in jth1 :
Typed by human
    nom: Vincent Perre
    sexe: M
    profession: Cultivateur
    domicile: Berrias, 07
      date: '1728-06-05'
      lieu: Berrias, 07
      date: '1728-06-07'
      lieu: Berrias, 07
    père: Pierre Perre
    mère: Gabrielle Thibon
        avec: Jeanne Bayle
          date: '1754-11-28'
          lieu: Berrias, 07
        avec: Thérèse Monbel
          date: '1758-04-04'
          lieu: Berrias, 07
        avec: Marie Coste
          date: '1762-10-26'
          lieu: Berrias, 07
      - acte de décès de son fils Vincent Perre
Generated by program
    local-id: jean-rocard
    sex: M
    official-name: 'Jean ROCARD'
      date: ~1606
      place: 'Vaux-sous-Aubigny,52190,Haute-Marne,Champagne-Ardenne,FRANCE '
      date: '1681-09-07'
    profession: Vigneron
        with: jeanne-dadant
          - francois-rocard
          - jeanne-chinardet
          - jean-rocard-1
          - jeanne-rocard
          - nicolas-rocard
          - jacques-rocard
      - ';p=jean;n=rocard'

Operating rules

One yaml file is considered like a GEDCOM file, in the sense that it contains definitions of persons and relationships that must be coherent.
Like for GEDCOM, coherence is only necessary within a given file : if you build a tree aggegating several gedcom files, merging the trees and ensuring a global coherence is the responsibility of the genealogy software, it is not part of gedcom syntax.

Unique id of a person

This is the most important point : every person must be identified by a unique id.
In gedcom files, this is done with syntax like @I1@ INDI for persons, or FAMS @F1@ and FAMC @F2@ for relationships.
In jth1, the links are implicit ; the user must be aware of this and is responsible to ensure the coherence of the ids.
Here is the mechanism :
  • By default, the id of a person is composed by a "slug" built using the person's name, followed by a hyphen, followed by the birth year (if it is known).
    A "slug" of a string is an other string where all accents are removed, all letters are lower-cased, and non alpha-numeric characters replaced by a hyphen.
    In the previous example, the id of Vincent Perre is vincent-perre-1728.
  • In case of doublons (this happens in particular for homonyms with unknown birth year), the ids are built adding -1 -2 etc. to the id.
  • This automatic building can be overriden by the user, who can impose an id, using a fild local-id.

Defining relationships

The relationships can be expressed either using person names or person ids.
In general, it is more convenient to directly use the person names, but using the person id may be necessary in case of doublon.


Each yaml file can have a header, which permits to specify comments on the author, the vocabulary file, and indicate the transformations to be done on place names for geonames matching.

Vocabulary file

As you can see in the example, some files are written using an english vocabulary, and other files use french. For example, the notion of birth can be expressed as birth: or naissance: The mechanism to do that uses vocabulary files : there is one unique vocabulary (almost english) used by the parser, but the user can associate a vocabulary file to his yaml files.

Here are extracts of the vocabulary file I used :
# general terms
type:             type
id-local:         local-id
par:              by
rôle:             role

### time
date:             date
début:            begin
fin:              end
naissance:        birth
mort:             death
décès:             death

### geo
lieu:             place
pays:             country
lieu-précis:      precise-place

### names
nom:              official-name
prénom:           given-name
nom-famille:      family-name
nom-courant:      name
surnom:           nickname
noms-alternatifs: alternative-names

### relations and events
relations:        relations
mariage:          mariage               # marriage with 2 r in correct english
                                        # but jth1 syntax is not exact english
contrat-mariage:  mariage-contract
divorce:          divorce
union-libre:      free-union
notaire:          notary

Splitting a yaml file into multiple files

One inconvenient of jth1 is that information may become messy when the tree starts to contain hundreds of persons. So I introduced the notion of "split yaml file", which permits to explode a yaml file into multiple files.
The rule is simple : all the files must be located in the same directory and one file (the "root file") must contain the header informations ; this file is the first file in alphabetical order.
For example, here are the files I wrote :
    ├── 1.asc-thierry-graff.yml
    ├── MMM-asc-stella-fetonti-1882.yml
    └── PPPM-asc-clemence-morin-1862.yml
The letters PPPM is a convention I use to organize my files, they are not part of jth1 syntax.
The important point here is that file 1.asc-thierry-graff is the root file.

This permits to easilly split the files when the tree grows.

Comments and analysis

The development of this syntax was motivated by two main reasons :
  • The ability to store my data in a textual format was for me a non-negotiable requirement. GEDCOM is stored in text files, but the syntax is not convenient if one tries to build a genealogy by directly editing GEDCOM files. The normal process is to use a genealogy software, which generally store the information in a relational database.
  • The limitations of GEDCOM : things like "free union", "PACS" (a french status between free union and mariage) and several information that can be found in the acts are not part of GEDCOM format ; in particular, there is no way to express the notion of approximative date.

Advantages and inconvenients

The main advantage of jth1 syntax is the ability to have the information directly stored in text files, which permits to manage a genealogical tree like a software development, using tools like git to frequently register the modifications and make back-ups on remote machines.

An other advantage is the ability to expand the vocabulary when a new type of information is found. This is quite easy at this stage of development.

The main inconvenient is that yaml files can become too big to be manageable when the tree grows. This is adressed by the notion of yaml split file, but one needs to be careful and rigourous.