- The ability to store my data in a textual format was for me an important requirement. GEDCOM is stored in text files, but the syntax is not convenient if one tries to build a genealogy by directly editing GEDCOM files. The normal process is to use a genealogy software, which generally store the information in a relational database.
- The limitations of GEDCOM : things like "free union", "PACS" (a french status between free union and mariage) and several information that can be found in the acts are not part of GEDCOM format ; in particular, there is no way to express the notion of approximative date.
There is no formal definition of the syntax ; the precise rules are fixed by the implementation of the program which analyzes the file to build structured data. This implementation is part of Jetheme program. The plan is to extract the code of jth1 syntax from jetheme's code to make an autonom program.
Jth1 uses yaml syntax, which is both easy to type by a human and understandable by a program.
Example
Here is an example of a person expressed in jth1 :Typed by human
- nom: Vincent Perre sexe: M profession: Cultivateur domicile: Berrias, 07 naissance: date: '1728-06-05' lieu: Berrias, 07 baptême: date: '1728-06-07' lieu: Berrias, 07 père: Pierre Perre mère: Gabrielle Thibon relations: - avec: Jeanne Bayle mariage: date: '1754-11-28' lieu: Berrias, 07 - avec: Thérèse Monbel mariage: date: '1758-04-04' lieu: Berrias, 07 - avec: Marie Coste mariage: date: '1762-10-26' lieu: Berrias, 07 sources: - acte de décès de son fils Vincent Perre
Generated by program
- local-id: jean-rocard sex: M official-name: 'Jean ROCARD' birth: date: ~1606 death: place: 'Vaux-sous-Aubigny,52190,Haute-Marne,Champagne-Ardenne,FRANCE' date: '1681-09-07' profession: Vigneron relations: - with: jeanne-dadant children: - francois-rocard - jeanne-chinardet - jean-rocard-1 - jeanne-rocard - nicolas-rocard - jacques-rocard sources: - 'http://gw.geneanet.org/jeanpierre16?lang=en;p=jean;n=rocard'
Operating rules
One yaml file is considered like a GEDCOM file, in the sense that it contains definitions of persons and relationships that must be coherent.Like for GEDCOM, coherence is only necessary within a given file : if you build a tree aggegating several gedcom files, merging the trees and ensuring a global coherence is the responsibility of the genealogy software, it is not part of gedcom syntax.
Unique id of a person
This is the most important point : every person must be identified by a unique id.In gedcom files, this is done with syntax like
@I1@ INDI
for persons, or FAMS @F1@
and FAMC @F2@
for relationships.
In jth1, the links are implicit ; the user must be aware of this and is responsible to ensure the coherence of the ids.
Here is the mechanism :
-
By default, the id of a person is composed by a "slug" built using the person's name, followed by a hyphen, followed by the birth year (if it is known).
A "slug" of a string is an other string where all accents are removed, all letters are lower-cased, and non alpha-numeric characters replaced by a hyphen.
In the previous example, the id of Vincent Perre isvincent-perre-1728
. -
In case of doublons (this happens in particular for homonyms with unknown birth year), the ids are built adding
-1
-2
etc. to the id. - This automatic building can be overriden by the user, who can impose an id, using a fild
local-id
.
Defining relationships
The relationships can be expressed either using person names or person ids.In general, it is more convenient to directly use the person names, but using the person id may be necessary in case of doublon.
Headers
Each yaml file can have a header, which permits to specify comments on the author, the vocabulary file, and indicate the transformations to be done on place names for geonames matching.Vocabulary file
As you can see in the example, some files are written using an english vocabulary, and other files use french. For example, the notion of birth can be expressed asbirth:
or naissance:
The mechanism to do that uses vocabulary files : there is one unique vocabulary (almost english) used by the parser, but the user can associate a vocabulary file to his yaml files.
Here are extracts of the vocabulary file I used :
# general terms type: type id-local: local-id par: by rôle: role ### time date: date début: begin fin: end naissance: birth mort: death décès: death ### geo lieu: place pays: country lieu-précis: precise-place ### names nom: official-name prénom: given-name nom-famille: family-name nom-courant: name surnom: nickname noms-alternatifs: alternative-names ### relations and events relations: relations mariage: mariage # marriage with 2 r in correct english # but jth1 syntax is not exact english contrat-mariage: mariage-contract divorce: divorce union-libre: free-union notaire: notary
Splitting a yaml file into multiple files
One inconvenient of jth1 is that information may become messy when the tree starts to contain hundreds of persons. So I introduced the notion of "split yaml file", which permits to explode a yaml file into multiple files.The rule is simple : all the files must be located in the same directory and one file (the "root file") must contain the header informations ; this file is the first file in alphabetical order.
For example, here are the files I wrote :
thierry-graff ├── 1.asc-thierry-graff.yml ├── MMM-asc-stella-fetonti-1882.yml (...) └── PPPM-asc-clemence-morin-1862.ymlThe letters
PPPM
is a convention I use to organize my files, they are not part of jth1 syntax.
The important point here is that file
1.asc-thierry-graff
is the root file.
This permits to easilly split the files when the tree grows.
Advantages and inconvenients
The main advantage of jth1 syntax is the ability to have the information directly stored in text files, which permits to manage a genealogical tree like a software development, using tools like git to frequently register the modifications and make back-ups on remote machines.An other advantage is the ability to expand the vocabulary when a new type of information is found. This is quite easy at this stage of development.
The main inconvenient is that yaml files can become too big to be manageable when the tree grows. This is adressed by the notion of yaml split file, but one needs to be careful and rigourous.