#!/usr/bin/bash
# -*- coding: utf-8 -*-
"""BDT file information extractor

Read a BDT file and print the found information as XML tree, database tables and easy-to-read lists. 
Use custom field lists to find the information.
"""
#==============================================================
__author__ = "cimpa.communication@protonmail.com"
__license__ = "GPL v2 or later"
__date__ = "2022-05-12"
__version__ = "v0.9"
#==============================================================

#==============================================================
#==============================================================
#==============================================================
#
# DAS PROGRAMM STARTEN:
#
# Bitte eine Konsole/ein Terminal öffnen, und in den Ordner gehen,
# wo diese Datei (bdt_extractor.py) ist.
#
# Dann bitte schreiben:
#
# python bdt_extractor.py
#
#==============================================================
#==============================================================
#==============================================================
'''Structure of this file:

  Comment: 
1/ Explanation of BDT file structure (German and English), and references to it
  (BDT file structure: data for patient, treatment, billing/statement notes. And: doctor's office, and some metadata)

  Program:
2/ A FIELDS: Field list: These fields are parsed and printed in the order specified. 
     Default fields: Fields can be added or deleted in the code; and the order can be changed
     GNUmed fields: they are comprehensive, they are taken as reference, if not specified, file fields.py is created
     Custom csv fields: they are read from a file

3/ B1 CREATE TREE - private methods: parse a BDT file, create first a flat tree,
    then an hierarchical XML-tree: use lxml and text operations because tree changes will exceed the RAM memory
4/ B2 CREATE TREE - public methods: control tree creation
5/ C USE THE TREE: traverse the tree and find information
6/ D OUTPUT: print the found metadata in the BDT file
             write csv files for database schema: patient:treatment 1:n, patient:statmentNote 1:n; with default fields
             writing list files, with user-defined fields (custom fields) (reading the field list from a file)
             writing the tree as XML file, with the user defined fields
7/ E PRINT SOME TECHNICAL BACKGROUND: output statistics of the parsing and of the tree
            write the tree with all fields (if custom fields are specified, an additional tree is written with "all fields")

8/ MAIN: First prints an explanatory text how to start the program and where to configure the field list ("Was macht die Datei?")
     Then the technical background is started. Statistics etc. are printed in English ("Background").
     Afterwards, when the Main is continued, the printout is in German:
     "Daten lesen":
       Create the XML-tree, 
       print the metadata ("Metadaten"), print some statistics ("Dies wurde gefunden"),
       writing the 3 database files (patient, treatment, note),
       the MINI CONFIG is explained ("Eine MINI-KONSOLE bietet diese Eingabe")
       Then an explanatory text is printed, which files were created and which fields were used ("Dateien wurden erstellt:", "Erklärung").

       Please note: The list of files is intentionally written last, and the MINI CONFIG is described above;
                    since it is assumed that people look at "the last printed lines", the most important is shown there.

     The minimalist configuration by console arguments MINI CONFIG is self-explanatory (please see the code); 
       additional files can be created and written
       user defined (custom) field lists like the fields of the GNUmed project can be set     
'''

'''
==============================================================
============================================================== Deutsch (English please see below)
==============================================================

BDT Format (xDT)
---------------------------------------
Quellen: 
-- BDT Format / wie ist eine BDT Datei aufgebaut:
    BDT 3.0 Satzbeschreibung, QMS Qualitätsring Medizinische Software e. V. (31.1.2013). v0.9, proposal 
    ! Alte Version des BDT, Felder sind z.T. anders. - Referenz für die Dateistruktur

-- Felder: 
    GNUmed German xDT mapping data, S.Hilbert, K.Hilbert. gmXdtMappings.py
    https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py
    ! Alte Version der Felder. - Die einzige Auflistung der (meisten) Felder, mit der Bezeichung dessen, was im Feld steht

-- Berechnung der Zeilenlänger, und eines Datenblocks: (auf diese Weise werden Daten gefunden und begrenzt)
    Diplomarbeit Sebastian Stäubert 2006
    Referenzmodell für die Kommunikation eines Universitätsklinikums mit dem niedergelassenen Bereich, in: Anhang A, XXI - XXII
    ! Inhalt veraltet, sowie das Zeilenbeispiel. - Die Berechnung ist sehr gut erklärt
---------------------------------------

Erklärung: 
- Eine BDT Datei besteht aus 4 Abschnitten: Praxis, Patienten, Behandlungen/Treatment, Abrechnungsnotizen/Statement-Note

- Datenbank Schema: 
-- Die die "Praxis" betreffenden Daten sind unabhängig von den anderen
-- Ein Patient ist identifiziert durch die Patienten-ID (eine vom Praxis-Programm vergebene Nummer)
   Die meisten Patientendaten gibt es nur einmal, einige mehrfach (1 Patient kann haben: n Familienmitglieder, n E-Mailaddressen)
   // in diesem Programm wurde das nicht berücksichtigt, Patienten werden identifiziert (u.a. durch den Wohnort) aber weitere Daten 
      werden nicht ausgelesen. Alle Patientendaten werden behandelt Patient:Information 1:1
-- Ein Patient erhält Behandlungen 1:n
-- Zu einem Patienten kann es Abrechnungsnotizen geben 1:n. Diese können Informationen (z.B. Diagnose) enthalten
   Achtung 1: Beziehung ist: Patient - Notiz, nicht Behandlung - Notiz
   Achtung 2: Notizen enthalten sequentielle Daten, die sich aufeinander beziehen
            (z.B. Zeile x Diagnose, Zeile x+1 Diagnosensicherheit, Zeile x+2 Diagnose, Zeile x+3 Diagnosensicherheit)
            .. hier müsste man Tests durchführen, um dies zu bestätigen (und in das Programm einzuarbeiten)

- BDT Dateistruktur
-- Information ist gegeben: als 1 Zeile, oder als ein Datenblock von n konsekutiven Zeilen
-- Der Beginn der Information ist markiert
-- Das Ende der Information ist (!) nicht markiert. (bzw. in einem Fall geschieht dies; ansonsten wird das Ende berechnet)
==============================================================

Zeilenformat: Wann *beginnt eine Information? // .. [endet* - bitte weiter unten erklärt]
---------------------------------------
bbbFFFFvvvvvvvvvv... 
	
	bbb    die Zeile besteht aus bbb Byte. bbb = count(chars) + 2 denn: die beiden unsichtbaren Charakter CR/LF
	FFFF   Feld. Zeigt an, welche Information in dieser Zeile (oder in dem Zeilenblock, beginnend hier) geschrieben ist
	vvv... Information
oder:
bbbFFFFSSSSvvvvvv...

	SSSS:  Zusatzfeld; konkretisiert das Feld

Felder, die eine Dateiabschnitt markieren: // Praxis, Patient, Behandlung, Abrechnungsnotiz; Referenz BDT 3.0, S. 9
	8000 - Beginn der Information

	0010 - Beginn Praxisdaten
        0022 - Beginn Meta-Daten zur einzulesenden BDT Datei 

	6100 - Beginn Patientendaten
	6200 - Beginn Behandlungdaten
	xxxx - Beginn Abrechnungsnotiz: verschiedene. u.a. 0101 (Behandlung), 0102 (Überweisung), 0190 (Vermutung: privat) 

	3000 - Patienten-ID
        8100 - Datenblocklänge, in Byte
==============================================================

Berechnung der Datenlänge: Wann endet* eine Information? - // [*beginnt - bitte darüber erklärt] 

   Demonstration (diese Zeilenfolge existiert, so, nicht; dies soll nur die Berechnung erklären)
---------------------------------------
Länge in Byte:
        10        20   
12345678901234567890

01380006100	= 13 = 11 Byte plus 2 CR/LF 
014810000223	= 14 = 12 Byte plus 2 CR/LF
0123000003	= 12 ..
0273101von der Vogelweide
0163102Walther
017310311700101
0153106Zwettl
01031101

bbb FFFF SSSS	// Format
bbb FFFF vvv... 

013 8000 6100		8000: Beginn von ?? - von 6100: Patientendaten // !! Die Bytelänge dieser Zeile zählt mit !!
014 8100 00223  	8100: Gesamtlänge = 223 = 14 (diese Zeile) + {12 + 27 + 16 + 17 + 15 + 10} (Folgezeilen) + 13 (Zeile 8000)
012 3000 003		3000: Patienten-Id = 3
027 3101 von der Vogelweide 	3101: Nachname
016 3102 Walther 		3102: Vorname
017 3103 11700101 	3101: Geburtsdatum
015 3106 Zwettl	  	3106: Ort
010 3110 1	  	3110: m = 1, w = 2

Lesen: Zeilen lesen bis Summe[Byte-Pro-Zeile] == Wert der Zeile Feld 8100 // !! Zeile Feld 8100's Byteangabe dazurechnen

==============================================================
============================================================== English (German please see above)
==============================================================

BDT Format (xDT)
---------------------------------------
References: 
-- BDT format / file structure:
    BDT 3.0 Satzbeschreibung, by QMS Qualitätsring Medizinische Software e. V. (31.1.2013). v0.9, proposal 
    ! Document outdated. Fields differ, but the file structure is up to date

-- Fields: 
    GNUmed German xDT mapping data, by S.Hilbert, K.Hilbert. gmXdtMappings.py
    https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py
    ! Fields outdated, but at least these are most of the fields

-- Calculate the line and data size: 
    Diplomarbeit Sebastian Stäubert 2006
    Referenzmodell für die Kommunikation eines Universitätsklinikums mit dem niedergelassenen Bereich, in: Anhang A, XXI - XXII
    ! Content outdated, file structure incorrect. But the calculation is well explained
==============================================================

Explanation: 
- The BDT file has 4 sections: Praxis, Patient(s), Treatment(s), Statement/bill Notes(s)

- Schema: 
-- praxis is the data of the doctor's, this data is independent from the others
-- a patient has a patientId and other patient's data (and same data that is n>1: 1 patient:n family members, or n email addresses)
-- a patient receives treatments
-- a patient is related to statement-notes (it is not the treatment that is related, but the patient)

- BDT file structure:
-- data is specified in lines
-- data is single line or specified in blocks of lines = consecutive lines
-- data (block) beginnings are specified
-- data (block) ends are not indicated (there is an exception, but data (blocks) should not be read until an 'end' indicator comes)


---------------------------------------

Format of 1 line: When does the data *start? // .. [end* - please see below]
---------------------------------------
bbbFFFFvvvvvvvvvv... 
	
	bbb    size of the line in byte. bbb = count(chars) + 2 because: a line has 2 invisible characters attached CR/LF
	FFFF   field, the indicator of: a/ data of the line or: b/ data of the line-block (line plus the following lines)
	vvv... value. all
or:
bbbFFFFSSSSvvvvvv...

	SSSS: indicator 2, specify the field further

Indicators for a section: // doctor's practice or patient or treatement or notes; BDT 3.0, p. 9
	8000 - start data block

	0010 - start doctor's practice
        0022 - start of the metadata block of the BDT file itself

	6100 - start patient
	6200 - start treatment
	xxxx - start notes: various, e.g. 0101 (treatment), 0102 (transfer), 0190 (private, assumed) 

	8100 - size of the data, in bytes
	3000 - patient-Id
==============================================================

Calculation of the size: When does the data end*? // [*start - please see above] Example 
                                                     
       (there is no such line sequence, this is a demonstration):
---------------------------------------
length in bytes:
        10        20   
12345678901234567890

01380006100	= 13 = 11 byte plus 2 CR/LF  
014810000223	= 14 = 12 byte plus 2 CR/LF 
0123000003	= 12 ..
0273101von der Vogelweide
0163102Walther
017310311700101
0153106Zwettl
01031101

bbb FFFF SSSS 	// format
bbb FFFF vvv... 

013 8000 6100		8000: start of ?, 6100: .. of patient // !! This line's byte size must be added 
014 8100 00223  	8100: Total bytes = 223 = 14 (this line) + {12 + 27 + 16 + 17 + 15 + 10} (next lines) + 13 (line 8000)
012 3000 003		3000: patientId 
027 3101 von der Vogelweide 	3101: last name
016 3102 Walther 		3102: name
017 3103 11700101 	3101: date of birth
015 3106 Zwettl	  	3106: town
010 3110 1	  	3110: m = 1, f = 2

so: read until sum[bytes per line] == value-of-field 8100 // !! add the field 8100' byte size

==============================================================
==============================================================

'''
#==============================================================

import codecs
import datetime
from collections import OrderedDict
import os.path, sys
import time

from lxml import etree

#==============================================================
#======================= A FIELDS =============================
#==============================================================

# 8000: start of a section
# 8100: length of section
# 3000: patientId
# 
# dict[field] = description. The description ..
#    - is written as table header for the database table
#    - is written as explanation into the lists


def getxDTBDTFieldsDefault(section): # section 0: '0022;   1: '6100',   2: '6200',   3: 0022 (0023,..)
    fields = OrderedDict()
 
#==============================================================
#
# >> Bitte Felder ergänzen oder die Reihenfolge ändern.
#
#    Bitte eine # (Raute) an den Zeilenanfang setzen, 
#    wenn das Feld nicht gelesen werden soll.
#
#============================================================== << bitte ab hier ändern

    if(section == 0): # 'Praxis'

        fields['9210'] = 'Version ADT-Satzbeschreibung'
        fields['9213'] = 'Version BDT' 

        fields['9600'] = 'Archivierungsart (1=Gesamt, 2=Zeitraum, 3=Quartal)'
        fields['9601'] = 'Zeitraum der Speicherung (TTMMJJJJTTMMJJJJ)'

#        fields['9206'] = 'Zeichensatz (encoding)' // empty # in 0023

    elif(section == 1): # 'Patient'

        fields['3102'] = 'Name        '
#        fields['3101'] = 'Nachname    '
        fields['3610'] = 'Patient seit'
        fields['3650'] = 'Diagnose    '
        fields['3108'] = 'Vers. Art   ' # Versichertenart MFR',	# 1=M,3=F,5=R

#        fields['3106'] = 'Ort         '
#        fields['3107'] = 'Str.        '
#        fields['3109'] = 'Hausnr.     '
#        fields['3103'] = 'geb.        '
#        fields['3110'] = 'm/f         '

    elif(section == 2): # 'Behandlung'

        fields['6205'] = 'Diagnose    '
        fields['6285'] = 'Behandlungszeit'

    elif(section == 3): # 'Abrechnungsnotizen'

        fields['5000'] = 'Termin      ' # Leistungstag
        fields['3673'] = 'Diagnose    '
        fields['3674'] = 'DSicherheit '

#============================================================== << bitte ab hier nicht ändern

    else: True

    return fields # OrderedDict: sorted in this written order


def getxDTBDTFieldsDefaultAllSections():

    fields = getxDTBDTFieldsDefault(0) # OrderedDict, meta

    for i in [1,2,3]: # patient, treatment, note

        iFields = getxDTBDTFieldsDefault(i)

        for field in iFields:
            fields[field] = iFields[field]

    return fields # OrderedDict


def getxDTBDTFieldsFromCsvFile(filename, separator=';'): # utf8 please
    fields = OrderedDict()

    try:
        with codecs.open(filename,'r', 'utf8') as f:
            for line in f:
                line = line.strip()
                if(len(line) > 5): # xxxx plus separator = len(5)
                    elements = line.split(separator)

                    if(len(elements[0].strip()) == 4 and len(elements[1].strip())> 0):
                        field = elements[0].strip()                
                        fields[field] = elements[1].strip()

        return fields  
    except ImportError:
        print('\n Datei ' + filename + ' wurde nicht gefunden. Felddefinitionen aus bdt_extractor.py (getxDTBDTFieldsDefault) werden statt dessen verwendet.\n' )
    return getxDTBDTFieldsDefaultAllSections() # OrderedDict     

'''
 https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py
 by S.Hilbert, K.Hilbert; GPL v2
'''    
def getxDTBDTFieldsFromGNUmed():
    fields = OrderedDict()
    isNewlyCreated = False

    if(not os.path.isfile('fields.py')): # fields.py is written if not found
        writeGNUmedFieldsFile()
        time.sleep(3) # wait until the file is written
        isNewlyCreated = True
        print('\n Datei fields.py wurde erstellt.')

    try:
        # from gmXdtMappings import xdt_id_map # not in python 2 // instead a file fields.py is used, which contains only the xdt_id_map

        from fields import xdt_id_map # fields.py

        # please note: decode (not encode) will change the type (type: <str>) to unicode

        fieldsUnsorted = []

        for field in xdt_id_map:
            fieldsUnsorted.append(field)
             
        fieldsSorted = sorted(fieldsUnsorted)
 
        for field in fieldsSorted: 
#           fields[field] = xdt_id_map[field]
            fields[field.decode('utf-8')] = (xdt_id_map[field]).decode('utf-8') # decode

        return fields, isNewlyCreated

    except:
        print('\n Eine Datei fields.py, darin ein Dict "xdt_id_map" wurde erwartet. Dies wurde nicht gefunden. Felddefinition aus bdt_extractor.py (getxDTBDTFieldsDefault) werden statt dessen verwendet.')
                    
    return getxDTBDTFieldsDefaultAllSections(), False # OrderedDict, set, False: file fields.py creation failed


def writeGNUmedFieldsFile(): # copied from GNUmed project, GPLv2. please attention: encoding

    filename = 'fields.py'

    text = GNUmedFieldsFileTxt.replace('"', '\'')
    lines = text.split('@')

    with codecs.open(filename,'w', 'utf8') as f: # output: utf8
        for line in lines:
            f.write(line.decode('utf8') + '\n') # decode: GNUmed does not use all utf8


GNUmedFieldsFileTxt = '# -*- coding: utf-8 -*-@# https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py@# GPL v2.0 or later@@xdt_id_map = {@@"6295": "??",@"6296": "??",@"6297": "??",@"6298": "??",@"6299": "??",@@"0101": "KBV-Prüfnummer",@"0102": "Softwareverantwortlicher /// xBDT: Softwarelizenz",@"0103": "Softwarename",@"0104": "Hardware",@"0105": "KBV-Prüfnummer",@"0111": "Email-Adresse des Softwareverantwortlichen",@"0121": "Strasse des Softwareverantwortlichen",@"0122": "PLZ des Softwareverantwortlichen",@"0123": "Ort des Softwareverantwortlichen",@"0124": "Telefonnummer des Softwareverantwortlichen",@"0125": "Telefaxnummer des Softwareverantwortlichen",@"0126": "Regionaler Systembetreuer",@"0127": "Strasse des Systembetreuers",@"0128": "PLZ des Systembetreuers",@"0129": "Ort des Systembetreuers",@"0130": "Telfonnummer des Systembetreuers",@"0131": "Telefaxnummer des Systembetreuers",@"0132": "Release-Stand der Software",@@"0201": "Arztnummer",@"0202": "Praxistyp",@"0203": "Arztname",@"0204": "Fachgebiet",@"0205": "Strasse der Praxisadresse",@"0206": "PLZ Ort der Praxisadresse",@"0207": "Arzt mit Leistungskennzeichen",@"0208": "Telefonnummer der Praxis",@"0209": "Telefaxnummer der Praxis",@"0210": "Modemnummer der Praxis",@"0211": "Arztname für Leistungsdifferenzierung",@"0213": "Leistungskennzeichen",@"0214": "Erläuterung zum Leistungskennzeichen",@"0215": "PLZ der Praxisadresse",@"0216": "Ort der Praxisadresse",@"0218": "E-Mail der Praxis/des Arztes",@"0225": "Anzahl der Ärzte",@@"0250": "Name erste freie Kategorie",@"0251": "Inhalt erste freie Kategorie",@@"0915": "PZN Medikament auf Kassenrezept",@"0917": "Packungsgrösse Medikament auf Kassenrezept",@"0918": "Packungsgrösse Medikament auf Privatrezept",@"0919": "Hilfsmittelbezeichnung",@"0920": "Hilfsmittelnummer",@"0922": "PZN Hilfsmittel",@"0923": "Anzahl Hilfsmittel",@"0925": "Heilmittel",@"0950": "PZN Dauermedikament",@"0951": "PZN Medikament auf Privatrezept",@"0952": "PZN Ärztemuster",@"0953": "Packungsgrösse Ärztemuster",@"0960": "Kennzeichnung Gebührenpflichtig",@"0961": "Kennzeichnung aut idem",@"0962": "Kennzeichnung noctu",@"0970": "Anzahl (Packungen) Medikament auf Rezept",@"0971": "Anzahl (Packungen) Medikament auf Privatrezept",@@"2002": "KASSENNAME für Albis (Quelle: mediSYS)",@@"2700": "IK des Krankenhauses",@"2701": "Fachgebiet laut LKA",@"2702": "Arztnummer des Anästhesisten",@"2706": "Indikationsschlüssel",@"2709": "Lfd. OP-Nummer",@"2710": "Lfd. OP-Nummer",@"2711": "OP-Datum",@"2720": "Blutung",@"2721": "Narkosezwischenfall",@"2722": "Pneumonie",@"2723": "Wundinfektion",@"2724": "Gefäss- oder Nervenläsion",@"2725": "Lagerungsschäden",@"2726": "Venenthrombose",@"2727": "Komplikation",@"2728": "Erfolgsbeurteilung hinsichtlich Indikationsstellung",@"2729": "Erfolgsbeurteilung hinsichtlich Histologie",@"2730": "Revisionseingriff",@"2731": "Stationäre Aufnahme",@"2732": "Angaben zu implantierten Materialien",@"2740": "Art der Operation",@"2741": "Dauer der Operation",@"2742": "Operierte Seite",@"2743": "Art der Anästhesie",@"2744": "Art der Anästhesie gemäss Klassifikation Strukturvertrag",@"2750": "Operateur hat Facharztstatus",@"2751": "Anzahl ärztl. Assistenten bei OP",@"2752": "(Ein) OP-Assistent hat Facharztstatus",@"2753": "Anzahl nichtärzticher Assistenten bei OP",@"2760": "Art der Anästhesie",@"2761": "Anästhesie erbracht",@"2762": "Dauer der Anästhesie",@"2770": "Blutung",@"2771": "Narkosezwischenfall",@"2772": "Pneumonie",@"2773": "Wundinfektion",@"2774": "Gefäss- oder Nervenläsion",@"2775": "Lagerungsschäden",@"2776": "Venenthrombose",@"2780": "Revisionseingriff erforderlich",@"2781": "Histologie",@"2782": "Stationäre Weiterbehandlung erforderlich",@@"3000": "Patientennummer/-kennung",@"3050": "Kürzel/lfd. Nummer",@"3100": "Namenszusatz/Vorsatzwort",@"3101": "Name des Patienten",@"3102": "Vorname des Patienten",@"3103": "Geburtsdatum des Patienten", @"3104": "Titel des Patienten",@"3105": "Versichertennummer des Patienten",@"3106": "PLZ/Wohnort des Patienten",@"3107": "Strasse/Hausnummer des Patienten",@"3108": "Versichertenart MFR", # 1=M,3=F,5=R@@"3109": "Hausnummer des Patienten", # 1=M,3=F,5=R # ADDED@@@"3110": "Geschlecht des Patienten", # 1=M,2=W or M/W/U@"3111": "Geburtsjahr des Patienten",@"3112": "PLZ des Patienten",@"3113": "Wohnort des Patienten",@"3114": "Wohnsitzländercode",@"3116": "KV-Bereich",@"3119": "Versicherten-ID (eGK)",@"3150": "Arbeitgeber", # nur bei header 0191@"3152": "Unfallversicherungsträger", # nur bei header 0191@@"3200": "Namenszusatz/Vorsatzwort des Hauptversicherten",@"3201": "Name des Hauptversicherten",@"3202": "Vorname des Hauptversicherten",@"3203": "Geburtsdatum des Hauptversicherten",@"3204": "Wohnort des Hauptversicherten",@"3205": "Strasse des Hauptversicherten",@"3206": "Titel des Hauptversicherten oder Familienverhältnis", # conflicting sources !@"3207": "PLZ des Hauptversicherten",@"3208": "Telefonnummer des Verletzten", # nur bei header 0191@"3209": "Wohnort des Hauptversicherten",@"3210": "Geschlecht des Hauptversicherten", # nur bei header 0191@@# scheinbar alter BDT ? (Quelle: mediSYS GmbH)@"3301": "Name des Patienten",@"3302": "Vorname des Patienten",@"3303": "Geburtsdatum des Patienten (TTMMJJ)",@"3306": "PLZ/Wohnort des Patienten",@"3307": "Straße/Hausnummer des Patienten",@"3308": "?? Status Patient",@@"3600": "Patientennummer (alter BDT ?, beobachtet bei Medistar)",@"3601": "Röntgennummer",@"3602": "Archivnummer",@"3603": "BG-Nummer",@"3610": "Datum Patient seit", # nur bei header 6100@"3612": "Datum Versichertenbeginn bei Kassenwechsel", # nur bei header 6100@"3620": "Beruf des Patienten", # nur bei header 6100@"3621": "Geschlecht des Patienten (Hilfsfeld, gestrichen)",@"3622": "Grösse des Patienten", # nur bei header 6100@"3623": "Gewicht des Patienten", # nur bei header 6100@"3625": "Arbeitgeber des Patienten", # nur bei header 6100@"3626": "Telefonnummer des Patienten", # nur bei header 6100@"3627": "Nationalität des Patienten", # nur bei header 6100@"3628": "Muttersprache des Patienten", # nur bei header 6100@"3630": "Arztnummer des Hausarztes", # nur bei header 6100@"3631": "Entfernung Wohnung-Praxis", # nur bei header 6100@"3635": "interne Zuordnung Arzt bei GP", # nur bei header 6100@"3637": "Rezeptkennung", # nur bei header 6100@"3649": "Dauerdiagnosen ab Datum", # nur bei header 6100@"3650": "Dauerdiagnosen", # nur bei header 6100@"3651": "Dauermedikamente ab Datum", # nur bei header 6100@"3652": "Dauermedikamente", # nur bei header 6100@"3654": "Risikofaktoren", # nur bei header 6100@"3656": "Allergien", # nur bei header 6100@"3658": "Unfälle", # nur bei header 6100@"3660": "Operationen", # nur bei header 6100@"3662": "Anamnese", # nur bei header 6100@"3664": "Anzahl Geburten", # nur bei header 6100@"3666": "Anzahl Kinder", # nur bei header 6100@"3668": "Anzahl Schwangerschaften", # nur bei header 6100@"3670": "Dauertherapie", # nur bei header 6100@"3672": "Kontrolltermine", # nur bei header 6100@"3673": "Dauerdiagnose (ICD-Code)",@"3674": "Diagnosensicherheit Dauerdiagnose",@"3675": "Seitenlokalisation Dauerdiagnose",@@"3700": "Name erste freie Kategorie", # nur bei header 6100@"3701": "Inhalt erste freie Kategorie", # nur bei header 6100@# 3704-3719 freie Kategorien@@"4101": "Abrechnungsquartal",@"4102": "Ausstellungsdatum",@"4103": "Gültigkeit",@"4104": "VKNR, Kassennummer",@"4105": "Geschäftsstelle der VK",@"4106": "Kostenträger-Untergruppe (KTAB)",@"4107": "Abrechnungsart",@"4109": "KVK: letzte Vorlage (TTMMJJ)",@"4110": "KVK: Gültigkeit bis",@"4111": "Krankenkassennummer (IK)",@"4112": "KVK: Versichertenstatus",@"4113": "KVK: Ost/West-Status/DMP-Kennzeichnung",@"4121": "Gebührenordnung",@"4122": "Abrechnungsgebiet",@"4123": "Personenkreis/Untersuchungskategorie",@"4124": "SKT-Zusatzangaben",@"4125": "Gültigkeitszeitraum von ... bis ...",@@"4201": "Ursache des Leidens",@"4202": "Unfall, Unfallfolgen",@"4203": "Früherkennung",@"4205": "MuVo-Datum",@"4206": "mutmasslicher Tag der Entbindung",@"4207": "Diagnose/Verdacht",@"4209": "erläuternder Text zur Überweisung",@"4210": "Ankreuzfeld LSR",@"4211": "Ankreuzfeld HAH",@"4212": "Ankreuzfeld ABO.RH",@"4213": "Ankreuzfeld AK",@"4215": "Konz. wegen (Text)",@"4217": "Vertragsarzt-Nr. des Erstveranlassers / Mit/Weiter (Text)", # conflicting sources@"4218": "Überweisung von Arztnummer",@"4219": "Überweisung von anderen Ärzten / an Name", # conflicting sources@"4220": "Überweisung an Fachgruppe",@"4221": "Kurativ // Präventiv / Sonstige Hilfen / bei belegärztlicher Behandlung",@"4222": "Kennziffer OI./O.II. // Prävention", # conflicting sources@"4223": "Kennziffer OIII. // Sonstige Hilfen", # conflicting sources@"4224": "AU bis",@"4233": "stationäre Behandlung von... bis...",@"4234": "anerkannte Psychotherapie",@"4235": "Datum des Anerkennungsbescheides",@"4236": "Klasse bei Behandlung",@"4237": "Krankenhausname",@"4238": "Krankenhausaufenthalt",@"4239": "Scheinuntergruppe",@"4243": "weiterbehandelnder Arzt",@"4261": "Kurart",@"4262": "Durchführung als Kompaktkur",@"4263": "genehmigte Kurdauer in Wochen",@"4264": "Anreisetag",@"4265": "Abreisetag",@"4266": "Kurabbruch am",@"4267": "Bewilligte Kurverlängerung in Wochen",@"4268": "Bewilligungsdatum Kurverlängerung",@"4269": "Verhaltenspräventive Massnahmen angeregt",@"4270": "Verhaltenspräventive Massnahmen durchgeführt",@"4271": "Kompaktkur nicht möglich",@@"4500": "Unfalltag",@"4501": "Uhrzeit des Unfalls",@"4502": "Eingetroffen in Praxis am",@"4503": "Uhrzeit des Eintreffens",@"4504": "Beginn der Arbeitszeit",@"4505": "Unfallort",@"4506": "Beschäftigung als",@"4507": "Beschäftigung seit",@"4508": "Staatsangehörigkeit",@"4509": "Unfallbetrieb",@"4510": "Unfallhergang",@"4512": "Verhalten des Verletzten nach dem Unfall",@"4513": "Erstmalige Behandlung",@"4514": "Behandlung durch",@"4515": "Art der ersten ärztlichen Behandlung",@"4520": "Alkoholeinfluß",@"4521": "Anzeichen eines Alkoholeinflusses",@"4522": "Blutentnahme zum c2h5oh-Nachweis",@"4530": "Befund",@"4540": "Röntgenergebniss",@"4550": "Art etwaiger Versorgung durch D-Arzt",@"4551": "krankhafte Verändrungen unabhängig vom Unfall",@"4552": "Bedenken gegen Angaben",@"4553": "Art der Bedenken gegen Angaben",@"4554": "Bedenken gegen Arbeistunfall",@"4555": "Art der Bedenken gegen Arbeitsunfall",@"4560": "arbeitsfähig",@"4561": "wieder arbeitsfähig ab",@"4562": "AU ausgestellt",@"4570": "besondere Heilbehandlung erforderlich",@"4571": "besondere Heilbehandlung durch",@"4572": "Anschrift behandelnder Arzt",@"4573": "AU ab",@"4574": "voraussichliche Dauer der AU",@"4580": "Rechnungsart",@"4581": "allgemeine Heilbehandlung durch",@"4582": "AU über 3 Tage",@"4583": "AU bescheinigt als",@"4584": "Nachschau erforderlich",@@"4601": "Rechnungsnummer",@"4602": "Rechnungsanschrift",@"4603": "überweisender Arzt",@"4604": "Rechnungsdatum",@"4605": "Endsumme",@"4608": "Abdingungserklärung vorhanden",@"4611": "Unterkonto Arzt",@"4613": "Anlage erforderlich",@"4615": "Kopfzeile",@"4617": "Fußzeile",@@"5000": "Leistungstag",@"5001": "Gebührennummer",@"5002": "Art der Untersuchung",@"5003": "Empfänger des Briefes",@"5004": "Kilometer",@"5005": "Multiplikator / Anzahl GNR",@"5006": "Um-Uhrzeit",@"5007": "Bestellzeit-Ausführungszeit",@"5008": "Doppelkilometer",@"5009": "freier Begründungstext",@"5010": "Medikament als Begründung",@"5011": "Sachkostenbezeichnung",@"5012": "Sach-/Materialkosten in Cent",@"5013": "Prozent der Leistung",@"5015": "Organ",@"5016": "Name des Arztes",@"5017": "Besuchsort bei Hausbesuchen",@"5018": "Zone bei Besuchen",@"5019": "Erbringungsort,Standort des Gerätes",@"5023": "GO-Nummern-Zusatz",@"5024": "GNR-Zusatzkennzeichen für poststationär erbrachte Leistungen",@"5060": "Beschreibung der GNR",@"5061": "Gebühr",@"5062": "Faktor",@"5063": "Betrag",@"5064": "Endsumme Privatrechnung",@"5090": "Honorarbezeichnung",@"5091": "Gutachtenbezeichnung",@@"6000": "Abrechnungsdiagnosen // xBDT: Diagnose",@"6001": "ICD-Schlüssel",@"6003": "Diagnosensicherheit",@"6004": "Seitenlokalisation",@"6005": "Histologischer Befund bei Malignität",@"6006": "Diagnosenerläuterung",@@"6200": "Behandlungsdaten gespeichert am",@"6205": "aktuelle Diagnose",@"6210": "Medikament verordnet auf Kassenrezept",@"6211": "Medikament verordnet auf Privatrezept",@"6215": "Ärztemuster",@"6220": "Befund",@"6221": "Fremdbefund",@"6222": "Laborbefund",@"6225": "Röntgenbefund",@"6230": "Blutdruck",@"6240": "Symptome",@"6260": "Therapie",@"6265": "physikalische Therapie",@"6280": "Überweisung Inhalt",@"6285": "AU Dauer (von - bis)",@"6286": "AU wegen",@"6287": "AU wegen (ICD-Code)",@"6288": "Diagnosesicherheit AU wegen",@"6289": "Seitenlokalisation AU wegen",@"6290": "Krankenhauseinweisung, Krankenhaus",@"6291": "Krankenhauseinweisung",@"6292": "Krankenhauseinweisung wegen (ICD-Code)",@"6293": "Diagnosesicherheit Krankenhauseinweisung wegen",@"6294": "Seitenlokalisation Krankenhauseinweisung wegen",@@"6300": "Bescheinigung",@"6301": "Inhalt der Bescheinigung",@"6306": "Attest",@"6307": "Inhalt des Attestes",@"6310": "Name des Briefempfängers",@"6311": "Anrede",@"6312": "Strasse",@"6313": "PLZ",@"6314": "Wohnort",@"6315": "Schlusssatz",@"6316": "Telefonnummer",@"6317": "Telefax",@"6319": "Arztnummer/Arztident",@"6320": "Briefinhalt",@"6325": "Bild-Archivierungsnummer",@"6326": "Graphikformat",@"6327": "Bildinhalt",@# 63xx und 63xx+1 belong to each other in pairs up to 6398/99@"6330": "freie Kategorie 1: Name",@"6331": "freie Kategorie 1: Inhalt",@@"7100": "Namenszusatz",@"7101": "Name",@"7102": "Vorname",@"7103": "Geburtsdatum",@"7104": "Titel",@"7106": "PLZ/Ort",@"7107": "Straße",@"7110": "Geschlecht: 1=männlich, 2=weiblich, 8=gemischt (Gemeinschaftspraxen o.ä.)",@"7112": "PLZ",@"7113": "Wohn-/Praxisort",@@"7200": "xBDT: Typ Textbaustein/Medikament (0=Medikament, 1=BTM, 2=Heilmittel, 3=Hilfsmittel, 4=Impfstoff, 5=Sprechstundenbedarf)",@"7201": "xBDT: KV-Nummer/Hinweise/Name /// AOK-DMP (D.M.): 1.-3. Stelle der Postleitzahl",@"7202": "xBDT: Fachrichtung/Textbaustein/PZN /// AOK-DMP (D.M.): Nummer des Diabetes-Paß",@"7203": "Telefon/Preis",@"7204": "Funktelefon/Festbetrag",@"7205": "Telefax/Negativliste (1=auf Liste)",@"7206": "E-Mail-Adresse/Packungsgröße",@"7207": "Kurzanrede/Wirkstoff",@"7208": "Briefanrede/Indikation",@"7209": "Briefschluß/Nebenwirkungen",@"7210": "Ansprechpartner/Gegenanzeigen /// AOK-DMP (D.M.): Datum der Erstmeldung",@"7211": "Vertretung/Wechselwirkungen",@"7212": "Bankname/Hinweise /// AOK-DMP (D.M.): bereits v. SSP mitbetreut; 1=nein, 2=ja",@"7213": "BLZ/Alternativmedikamente",@"7214": "Kontonummer",@"7215": "Bemerkung /// AOK-DMP (D.M.): Schulungsstatus; 1=nicht 2=geschult",@"7216": "Sonstiges /// AOK-DMP (D.M.): Jahr der letzten Schulung; Vorgabe 1979",@"7217": "Gruppenkennzeichen: 1=Arztkollege, 2=Arbeitgeber, 4=Krankenhaus, 5=BG, 6=Sonstige",@"7218": "Internet-Adresse",@@"7220":"AOK-DMP (D.M.): Schulung laut Vertrag durchgeführt; ja, nein",@"7221":"AOK-DMP (D.M.): Begründung für keine Schulung; 1 bis 5",@"7222":"AOK-DMP (D.M.): Klartext für Sonstige 7221 = 5",@"7223":"AOK-DMP (D.M.): Schulungsprogramm; 1 bis 17",@"7224":"AOK-DMP (D.M.): Schulungsinstitution; 1 bis 4",@"7226":"AOK-DMP (D.M.): Schwangerschaft; 1=nein, 2=ja",@"7227":"AOK-DMP (D.M.): Mitglied Selbsthilfegruppen; 1=nein, 2=ja",@"7228":"AOK-DMP (D.M.): Überweisung SPP/HA veranlasst ?; 1=nein, 2=ja",@"7229":"AOK-DMP (D.M.): Begründung für keine Überweisung;1 bis 4",@"7230":"AOK-DMP (D.M.): Klartext Sonstiges 7229 = 5",@@"8000": "Satzidentifikation >>===============",@"8100": "Satzlänge",@@"8301": "Eingangsdatum des Auftrags im Labor", ## nicht in GDT 2.1 Specs (KS)@"8302": "Berichtsdatum", ## nicht in GDT 2.1 Specs (KS)@"8303": "Berichtszeit", ## nicht in GDT 2.1 Specs (KS)@"8310": "Anforderungsnummer", @"8311": "(interne) Auftragsnummer des Labors",## nicht in GDT 2.1 Specs (KS)@"8312": "Kunden- bzw. Arztnummer",## nicht in GDT 2.1 Specs (KS)@"8315": "GDT-ID Empfänger",@"8316": "GDT-ID Sender",@"8320": "Labor Bezeichnung", ## nicht in GDT 2.1 Specs (KS)@"8321": "Labor Strasse", ## nicht in GDT 2.1 Specs (KS)@"8322": "Labor PLZ", ## nicht in GDT 2.1 Specs (KS)@"8323": "Labor Ort", ## nicht in GDT 2.1 Specs (KS)@@"8401": "Befundstatus (E=End, T=Teil, V=Vor, A=Archiv)",@"8402": "Geräte-/Verfahrensspezifisches Kennfeld",@"8403": "Gebührenordnung",@"8404": "Kosten in Doppelpfennigen",@"8406": "Kosten in Cent",@"8407": "Geschlecht Patient", ## nicht in GDT 2.1 Specs (KS)@"8410": "Test-Ident/LDT-Kürzel",@"8411": "Testbezeichnung",@"8417": "Zuordnung (A,D,T,L...) neu für KVT",@"8418": "Teststatus",@"8420": "Ergebnis-/Meßwert",@"8421": "Einheit",@"8422": "Grenzwert Indikator",@"8428": "Probematerial-Ident",@"8429": "Probenmaterial-Nummer",@"8430": "Probenmaterial-Bezeichnung",@"8431": "Material_Spezifikation",@"8432": "Abnahme-Datum",@"8433": "Abnahme-Zeit",@"8440": "Keim-Ident",@"8441": "Keim-Bezeichnung",@"8442": "Keim-Nummer",@"8443": "Methode der Resistenzbestimmung",@"8444": "Wirkstoff-Ident",@"8445": "Wirkstoff-Generic-Nummer",@"8446": "MHK/Breakpoint",@"8447": "Resistenz-Interpretation",@"8460": "Normalwert-Text",@"8461": "Normalwert untere Grenze",@"8462": "Normalwert obere Grenze",@"8470": "Anmerkung",@"8480": "Ergebnis-Text",@"8485": "Zielwert KVT",@"8486": "Ersteintritt",@"8490": "Abschluss-Zeile",@@"8609": "Gebührenordung",@"8990": "Signatur",@@"9100": "Arztnummer des Absenders",@"9102": "Empfänger",@"9103": "Erstellungsdatum (TTMMJJJJ)",@"9105": "laufende Nummer Datenträger im Paket (xBDT: immer 1)",@"9106": "verwendeter Zeichensatz (1=7, 2=8-bit-Code)",@"9115": "Erstellungsdatum ADT-Datenpaket",@"9116": "Erstellungsdatum KADT-Datenpaket",@"9117": "Erstellungsdatum AODT-Datenpaket",@"9118": "Erstellungsdatum STDT-Datenpaket",@"9132": "enthaltene Datenpakete dieser Datei",@@"9202": "Gesamtlänge Datenpaket (Byte)",@"9203": "Anzahl Datenträger im Paket",@"9204": "Abrechnungsquartal",@"9206": "Zeichensatz (encoding)",@"9210": "Version ADT-Satzbeschreibung",@"9211": "Version Satztabelle ADT",@"9212": "Version der Satzbeschreibung",@"9213": "Version BDT",@"9218": "Version GDT",@"9233": "GO-Stammdatei-Version",@@"9600": "Archivierungsart (1=Gesamt, 2=Zeitraum, 3=Quartal)",@"9601": "Zeitraum der Speicherung (TTMMJJJJTTMMJJJJ)",@"9602": "Beginn der Übertragung (HHMMSSCC)",@@"9901": "Systeminterner Parameter /// xBDT: Praxishaupttyp bei untergeordneten Praxen"@}'

# ========================================================================== 

def writeFile(filename, lines, ending=None):

    nowF = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    filenameD = filename + '_' + nowF + '.txt'
    if(ending is not None):
        filenameD = filename + '_' + nowF + '.' + ending

    with codecs.open("neu_" + filenameD,'w', 'utf8') as f: # output: utf8
        for line in lines:
            f.write(line + '\n')

def returnFieldAndDescriptionAsLines(fieldDict):
    result = []
    for field in fieldDict:
        value = fieldDict[field].replace(':',';') # please note: be safe, to have ':' as separator
        result.append(field + ': ' + value)
    return result


def replaceDangerousCharsForCsvInLines(lines):
    result = []
    for line in lines:
        line = line.replace(';',',') # ; to ,
        line = line.replace(':',';') # now ; as separator
        line = line.replace('"','\'') # " to '
        result.append(line)
    return result


#==============================================================
#=============== B1 CREATE TREE - private methods =============
#==============================================================
'''
 isUseAllFields = True: take all the information (all fields)

 isUseAllFields = False: take only the information of fields specified. Fields are parameter fKeys

   noteIndicators = {'0101', '0102', '0190'} # indicators are hardcoded for better understanding.
   metaIndicators = {'0022'}                 # change here, in readLinesReturnFlatTree(), to use the indicators as parameters

   run it without dependency (copy this part out): readLinesReturnFlatTree(lines, None, None, True, None) # lines = lines of the BDT-file

'''
def readLinesReturnFlatTree(lines, noteIndicators, metaIndicators, isUseAllFields = False, fields = None):

        fieldKeys = None
         
        if(isUseAllFields == False):
            if (fields is not None) and (len(fields) > 0): # if fields are given            
                fieldKeys = set(fields) # keys as set; because the order is given by the input lines' sequence
            else:
                fieldKeys = set(getxDTBDTFieldsDefaultAllSections())

        unused = []            

        recordLength = recordLengthCalculated = 0
        isInData = isFindLength = False

        root = etree.Element("root")        
        currentNodeToAdd = None
        currentNodeType = ''

        noteIndicator = '' # a property is preferable, the lxml lib does not provide it
        
        isMetaData = False
        isMetaCollected = False
        
        for line in lines: 

            if(isInData):
                if(recordLength == recordLengthCalculated and recordLength > 0):

                    isInData = False
                    currentNodeToAdd = None


                    if(line[3:7] == '8000'): # specified. please don't change

                        isInfoToCollect = False

                        if(line[7:11] == '6100'): # specified.
                            currentNodeType = 'patientdata'
                            isInfoToCollect = True

                        elif(line[7:11] == '6200'): # specified.
                            currentNodeType = 'treatment'
                            isInfoToCollect = True

                        elif(line[7:11] in {'0101', '0102', '0190'}): # or: noteindicators
                            currentNodeType = 'note'
                            isInfoToCollect = True

                            noteIndicator = line[7:11] # remember the field

                        elif(line[7:11] in {'0022'}): # or: metaindicators
                            currentNodeType = 'meta'                             

                            if(not isMetaCollected):
                                isInfoToCollect = True # one-time metadata collection (they are repeated) 

                            
                        if(isInfoToCollect):
                            recordLengthCalculated = int(line[0:3])
                            isFindLength = True
                             
                        continue

                    unused.append(line)

                    continue

                else: 
                    field = line[3:7]                    

                    if(field == '3000'): # patientId specified.
                        patientId = etree.SubElement(currentNodeToAdd, "patientId")
                        patientId.text = str( int(line[7:]))
                    else:
                        
                        if(isUseAllFields or (field in fieldKeys)): # (!!) here: decide whether to take the info or skip it              

                            if(len( line[7:].strip() ) > 0):                                
                                fieldNode = etree.SubElement(currentNodeToAdd, 'f'+field) # add prefix, can't start with a digit
                                fieldNode.text = line[7:]                       

                    recordLengthCalculated = recordLengthCalculated + int(line[0:3])


            elif(isFindLength):

                if(line[3:7] == '8100'): # specified.

                    if(currentNodeType == 'meta'):
                        isMetaCollected = True # one-time metadata collection (they are repeated)

                    currentNodeToAdd = etree.SubElement(root, currentNodeType) # (!!) add the node here
                    if(currentNodeType == 'note'):
                        noteIndicatorNote = etree.SubElement(currentNodeToAdd, 'ffield') # add prefix 'f'
                        noteIndicatorNote.text = noteIndicator

                    recordLength = int(line[7:])
                    recordLengthCalculated = recordLengthCalculated + int(line[0:3])

                    isFindLength = False
                    isInData = True

                else:
                    Error = True # TODO
                    quit()

 
            else:
                if(line[3:7] == '8000'): # specified.

                    isInfoToCollect = False

                    if(line[7:11] == '6100'): # specified.
                        currentNodeType = 'patientdata'
                        isInfoToCollect = True

                    elif(line[7:11] == '6200'): # specified.
                        currentNodeType = 'treatment'
                        isInfoToCollect = True

                    elif(line[7:11] in {'0101', '0102', '0190'}): # or: noteindicators
                        currentNodeType = 'note'
                        isInfoToCollect = True

                        noteIndicator = line[7:11]

                    elif(line[7:11] in {'0022'}): # or: metaindicators
                        currentNodeType = 'meta'
                        isInfoToCollect = True

                        if(isMetaCollected):
                            isInfoToCollect = False # one-time metadata collection (they are repeated, keep it small)                     

                            
                    if(isInfoToCollect):
                        recordLengthCalculated = int(line[0:3])
                        isFindLength = True

                else:
                    unused.append(line)

                
        return root, unused


#==============================================================
'''
#   input:   <root>
                 <meta/>
                 <patientdata><patientId/></patientdata>
                 <treatment><patientId/></treatment>
                 <note><patientId/></note>                 

#   output : <root>
                 <meta/>
                 <patients>
                    <patient>
                       <patientId/>
                       <patientdata></patientdata>
                       <treatment><patientId/></treatment>
                       <note><patientId/></note>
                    </patient>
                 </patients>

'''
def modifyTreeToHierarchical(root): # a flat tree as generated by readLinesReturnFlatTree() is expected

# ---- issue: tree change can exceed RAM memory --------------------
#
#         so: work with text-chunks: 
#
#         > cut branch from the tree > to text > 
#
#         > this small part of the text > to a 'mini tree' > 
#         > change this mini tree > to text >
#
#         do this for all parts. then: 
#          > concatenate all texts > to tree


    # ---------- store the data. there is only one tree in the RAM, so it would be overwritten

    patientdataTxtList = []
    for patientdata in root.xpath('//patientdata'):
        patientdataTxtList.append(etree.tostring(patientdata).strip()) # list as text elements

    treatmentTxtList = []
    for treatment in root.xpath('//treatment'):
        treatmentTxtList.append(etree.tostring(treatment).strip()) 

    noteTxtList = []
    for note in root.xpath('//note'):
        noteTxtList.append(etree.tostring(note).strip()) 

    metaTxtList = []
    for meta in root.xpath('//meta'):
        metaTxtList.append(etree.tostring(meta).strip()) 

    root = None

    # ---------- process the text parts

    treatmentDict = {} # store all data with the patientId as key
    pIdsWithTreatSet = set()

    for treatmentTxt in treatmentTxtList:
 
        xml = '<root>' + treatmentTxt + '</root>'
        miniTree = etree.fromstring(xml)

        patientIdNode = miniTree.xpath('//patientId')[0]
        patientId = patientIdNode.text
        pIdsWithTreatSet.add(patientId) # does the patient have a treatment node (data filled out or not)

        treatmentNode = miniTree.xpath('//treatment')[0] # data filled out? 
        if(len(treatmentNode.getchildren()) < 2): # 1st child is the patientId
            continue # if no data filled, skip it
        
        patientIdNode.getparent().remove(patientIdNode) # delete the patientId-node
        
        nodeAsTextList = []
        if(patientId in treatmentDict):
            nodeAsTextList = treatmentDict[patientId]
        else:
            nodeAsTextList = []

        nodeAsTextList.append(etree.tostring(treatmentNode).strip())
        treatmentDict[patientId] = nodeAsTextList
     
    # # test:
    #outputTxt = ''
    #for patientId in treatmentDict:
    #   outputList = treatmentDict[patientId]
    #   for outputListElement in outputList:
    #       outputTxt = outputTxt + outputListElement.strip()
    #
    #outputXml = '<root>' + outputTxt + '</root>'
    #outputTree = etree.fromstring(outputXml)
    #print etree.tostring(outputTree, pretty_print=True)


   # ---------- # the same code as for treatment

    noteDict = {}
    pIdsWithNoteSet = set()

    for noteTxt in noteTxtList:

        xml = '<root>' + noteTxt + '</root>'
        miniTree = etree.fromstring(xml)

        patientIdNode = miniTree.xpath('//patientId')[0]
        patientId = patientIdNode.text
        pIdsWithNoteSet.add(patientId) # does the patient have a note node (data filled out or not)

        noteNode = miniTree.xpath('//note')[0] # data filled out? 
        if(len(noteNode.getchildren()) < 2): # 1st child is the patientId
            continue # if no data filled, skip it
        
        patientIdNode.getparent().remove(patientIdNode) # delete the patientId-node
        
        nodeAsTextList = []
        if(patientId in noteDict):
            nodeAsTextList = noteDict[patientId]
        else:
            nodeAsTextList = []

        nodeAsTextList.append(etree.tostring(noteNode).strip())
        noteDict[patientId] = nodeAsTextList
     
    # # test:
    #outputTxt = ''
    #for patientId in noteDict:
    #   outputList = noteDict[patientId]
    #   for outputListElement in outputList:
    #       outputTxt = outputTxt + outputListElement.strip()
    
    #outputXml = '<root>' + outputTxt + '</root>'
    #outputTree = etree.fromstring(outputXml)
    #print etree.tostring(outputTree, pretty_print=True)

    # ----------

    patientdataDict = {}

    for patientdataTxt in patientdataTxtList:

        xml = '<root>' + patientdataTxt + '</root>'
        miniTree = etree.fromstring(xml)
       
        patientIdNode = miniTree.xpath('//patientId')[0]
        patientId = patientIdNode.text
 
        patientIdNode.getparent().remove(patientIdNode) # delete the patientId-node

        patientdata = miniTree.xpath('//patientdata')[0]
 
        patientdataDict[patientId] = etree.tostring(patientdata).strip()

    # # test:
    #outputTxt = ''
    #for patientId in patientdataDict:
    #    outputTxt = outputTxt + patientdataDict[patientId]
    #
    #xml = '<root>' + outputTxt + '</root>'
    #outputTree = etree.fromstring(xml)
    #print etree.tostring(outputTree, pretty_print=True)


    metaTxt = ''
    for meta in metaTxtList:
        metaTxt = metaTxt + meta.strip()

    # ---------- concatenate:

    patientsTxt = ''

    for patientId in patientdataDict:

        patientIdTxt = '<patientId>' + patientId + '</patientId>'

        treatmentsTxt = ''
        if(patientId in treatmentDict):
            treatments = treatmentDict[patientId]
            for treatment in treatments:
                treatmentsTxt = treatmentsTxt + treatment.strip() 

        notesTxt = ''
        if(patientId in noteDict):
            notes = noteDict[patientId]
            for note in notes:
                notesTxt = notesTxt + note.strip() 

        patientsTxt = patientsTxt + '<patient>' + patientIdTxt + patientdataDict[patientId] + treatmentsTxt + notesTxt + '</patient>'

    
    xml = '<root>' + metaTxt + '<patients>' + patientsTxt + '</patients></root>'
    root = etree.fromstring(xml)

    # print etree.tostring(root, pretty_print=True)

    return root, pIdsWithTreatSet, pIdsWithNoteSet, len(treatmentTxtList), len(noteTxtList) # tree, set, set, int, int


#==============================================================
#================ B2 CREATE TREE - public methods =============
#==============================================================

# Please note! BDT file is _not_ unicode/utf8 but iso-8859-1

def readBDTFile(filename):
    lines = []

    with codecs.open(filename,'r', "iso-8859-1") as f:
        for line in f:
            line = line.replace('\15', '')
            line = line.replace('\12', '')
            line = line.strip()
            if(len(line) > 0):
                lines.append(line)
    return lines


'''
Abrechungsnotizen Felder: Satzbeschreibung:

1.1. KVDT-Abrechnungen „0101“: Ärztliche Behandlung, „0102“: Überweisungsfall, 
„0103“: Belegärztliche Behandlung, „0104“: Notfall-dienst / Vertretung / Notfall, 
„sad1”: SADT-ambulante Behandlung, „sad2”: SADT-Überweisung, „sad3”: SADT-Belegärztliche Behandlung, 
„0109“: Kurärztliche Behandlung. 
1.2. Selektivvertragstypen:„GEVK“, „HÄVG“, „MEDI“, „KV“.
1.3. Privatabrechnung ,„padx“.

assumption:
0190 privat
'''
def readDataReturnTree(lines, isUseAllFields = False, fieldDict = None):

    # patientIndicator = 6100, treatmentIndicator = 6200

    noteIndicators = {'0101', '0102', '0190'} # (!!) indicators are hardcoded, in the tree-reader readLinesReturnFlatTree();

    metaIndicators = {'0022'} # (!!) hardcoded .. please change readLinesReturnFlatTree() if you want to use the passed parameters

    indicators = {}
    indicators[0] = metaIndicators
    indicators[3] = noteIndicators


    flatTree, unused = readLinesReturnFlatTree(lines, noteIndicators, metaIndicators, isUseAllFields, fieldDict)

    #print etree.tostring(flatTree, pretty_print=True)

    hierarchicalTree, pIdsWithTreatSet, pIdsWithNoteSet, numTreatFoundInFile, numNotesFoundInFile = modifyTreeToHierarchical(flatTree)

    #print etree.tostring(hierarchicalTree, pretty_print=True)

    return hierarchicalTree, unused, pIdsWithTreatSet, pIdsWithNoteSet, numTreatFoundInFile, numNotesFoundInFile, indicators

#==============================================================
#====================== C USE THE TREE ========================
#==============================================================

def getTreeStatistics(root):
    noOfPatientsWithoutTreatment = noOfPatientWithoutNote = 0  

    patientNodeList = root.xpath('//patient')
    countTotalP = len(patientNodeList)
    
    countTotalT = len(root.xpath('//treatment'))
    countTotalN = len(root.xpath('//note'))

    for patient in patientNodeList:
 
        # patient without: a) no node was found b) node was found, but data was not written (fields, but empty)
 
        if(len(patient.xpath('//treatment')) < 1): 
            noOfPatientsWithoutTreatment = noOfPatientsWithoutTreatment + 1  

        if(len(patient.xpath('//note')) < 1): 
            noOfPatientWithoutNote = noOfPatientWithoutNote + 1  
                                                   
    return countTotalP, countTotalT, countTotalN, noOfPatientsWithoutTreatment, noOfPatientWithoutNote 
 

def getTreeFields(root):
    pFieldSet = set()
    tFieldSet = set()
    nFieldSet = set()
       
    for field in root.xpath('//patientdata/*'): 
        pFieldSet.add(field.tag[1:]) # cut prefix 'f'

    for field in root.xpath('//treatment/*'): 
        tFieldSet.add(field.tag[1:]) # cut prefix 'f'

    for field in root.xpath('//note/*'): 
        nFieldSet.add(field.tag[1:]) # cut prefix 'f'
             
    nFieldSet.remove('field') # remove the added note field
 
    return pFieldSet, tFieldSet, nFieldSet


#==============================================================
#======================== D OUTPUT ============================
#==============================================================

def printMeta(metaNode): # defined fields of getxDTBDTFieldsDefault() only

    fields = getxDTBDTFieldsDefault(0)

    metaDict = {}
    for child in metaNode.getchildren():
        metaDict[child.tag[1:]] = child.text # cut the added prefix 'f' like of f1234

    for field in fields: # maintain the order specified in the method getxDTBDTFieldsDefault
        if(field in metaDict):
            print(fields[field] + ': ' + metaDict[field])

#============================================================== Database tables: default fields of getxDTBDTFieldsDefault() only

# Table headers for database csv files: default fields only of getxDTBDTFieldsDefault()

def getTableHeaderFields(fieldsToUseKeyValue):

    result = ""

    for field in fieldsToUseKeyValue:
        title = fieldsToUseKeyValue[field].strip()
        if(len(title) < 1):
            title = field
        title.replace(';', ',')
        result = result + ';' + title

    return result[1:] # strip the first semicolon ;


def returnPatientTableForDatabase(root): # default fields only
    
    fields = getxDTBDTFieldsDefault(1) # 1 = patient

    tableHeader = 'PatientId;' + getTableHeaderFields(fields)

    table = []
    table.append(tableHeader)

    fieldList = list(fields) # list of the OrderedList

    for patient in root.xpath('//patient'):
       
        tableRecord = ''

        for field in fieldList:
            value = ''
            for element in patient.xpath('//f' + field):
                value = element.text.replace(';',',')
                break # TODO if more nodes of same field

            tableRecord = tableRecord + ';' +  value

        tableRecord = str(patient.xpath('//patientId/text()')[0]) + tableRecord # has a ';' in front
        table.append(tableRecord)       

    return table


def returnTreatmentTableForDatabase(root): # default fields only
    
    fields = getxDTBDTFieldsDefault(2) # treatment

    tableHeader = getTableHeaderFields(fields)
    tableHeader = 'PatientId;BehandlungsId;' + tableHeader

    table = []
    table.append(tableHeader)

    fieldList = list(fields)

    treatmentId = 1 # generated only # TODO add to tree?

    for treatment in root.xpath('//treatment'):

        tableRecord = ''

        for field in fieldList:
            value = ''
            for element in treatment.xpath('//f' + field):
                value = element.text.replace(';',',')
                break # TODO if more nodes of the same field - concat?

            tableRecord = tableRecord + ';' +  value

        tableRecord = str(treatment.xpath('//patientId/text()')[0]) + ';' + str(treatmentId) + tableRecord # has a ';' in front
        table.append(tableRecord) 

        treatmentId = treatmentId + 1 

    return table


def returnNotesTableForDatabase(root): # defined fields only

    fields = getxDTBDTFieldsDefault(3) # notes

    tableHeader = getTableHeaderFields(fields)
    tableHeader = 'PatientId;NotizId;Notizart;' + tableHeader

    table = []
    table.append(tableHeader)

    fieldList = list(fields)

    # TODO verify: do the fields belong together? e.g. date- diagnose- dauer-diagnose to one record?
        
    noteId = 1 # generated only # TODO add to tree?

    for note in root.xpath('//note'):
      
        tableRecord = ''
        noteField = ''

        noteField = note.xpath('//ffield/text()')[0] # bdt field of this note, added with a generated node-name
 
        for field in fieldList:
            value = ''
            for element in note.xpath('//f' + field):
                value = element.text.replace(';',',')
                break # TODO if more nodes of the same field - concatenate?

            tableRecord = tableRecord + ';' +  value

        tableRecord = str(note.xpath('//patientId/text()')[0]) + ';' + str(noteId) + ';' + noteField + tableRecord # has a ';' in front
        table.append(tableRecord) 

        noteId = noteId + 1 

    return table

#============================================================== Lists: specified fields (= the custom field list read in Main)

def writePatientList(root, filename='Liste_P', limitNumberOfPatients=0, fieldsToTake = None):
        
    fields = getxDTBDTFieldsDefault(1) # 1 = patient, default field list of getxDTBDTFieldsDefault()

    if(fieldsToTake is not None): # (!) custom field list, read in Main
        fields = fieldsToTake # use the list to set the order
         
    nowF = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    filenameD = filename + '_' + nowF + '.txt'

    count = 1

    with codecs.open(filenameD,'w', 'utf8') as f: # output: utf8

        for patientdata in root.xpath('//patientdata'):

            if(limitNumberOfPatients > 0 and count > limitNumberOfPatients):
                break
            count = count + 1

            f.write('\n\nPatientennummer ' + patientdata.xpath('..//patientId/text()')[0] + '\n') # find the parent node
           
            for field in fields:
                for element in patientdata.xpath('//f' + field):
                    f.write(fields[field] + ': ' + element.text + '\n')


def writePatientTreatmentNoteList(root, filename='Liste_PBA', limitNumberOfPatients=0, fieldsToTake = None): 

    fieldsP = getxDTBDTFieldsDefault(1) # patient, default fields
    fieldsT = getxDTBDTFieldsDefault(2) # treatment
    fieldsN = getxDTBDTFieldsDefault(3) # notes

    if(fieldsToTake is not None):
        fieldsP = fieldsT = fieldsN = fieldsToTake

    nowF = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    filenameD = filename + '_' + nowF + '.txt'
    
    count = 1

    with codecs.open(filenameD,'w', 'utf8') as f: # output: utf8
           
        for patient in root.xpath('//patient'):
     
            if(limitNumberOfPatients > 0 and count > limitNumberOfPatients): # limit number of output
                break
            count = count + 1

            f.write('\n\nPatientennummer ' + patient.xpath('//patientId/text()')[0] + '\n') 
   
            patientdata = patient.xpath('//patientdata')[0]
            for field in fieldsP:
                for element in patientdata.xpath('//f' + field):
                    f.write(fieldsP[field] + ': ' + element.text + '\n')
            
            
            for treatment in patient.xpath('//treatment'):
                f.write('------------\n')
                for field in fieldsT:
                    for element in treatment.xpath('//f' + field):
                        f.write(fieldsT[field] + ': ' + element.text + '\n')


            for note in patient.xpath('//note'):
                f.write('------------\n\nnote type ' + patient.xpath('//ffield/text()')[0] + '\n') # added, the field of the note
             
                for field in fieldsN:
                    for element in note.xpath('//f' + field):
                        f.write(fieldsN[field] + ': ' + element.text + '\n')            
 
#============================================================== Tree

def writeTree(root, filename, whichFieldsTxt, indicatorsAsOneLineTxt):
    whichFields = '--'
    indicatorsAsOneLine = '--'
    if(whichFieldsTxt is not None):
        whichFields = whichFieldsTxt
    if(indicatorsAsOneLineTxt is not None):
        indicatorsAsOneLine = indicatorsAsOneLineTxt

    nowF = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
    filenameD = "neu_" + filename + '_' + nowF + '.xml'

    with codecs.open(filenameD,'w', 'utf8') as f: # output: utf8
        f.write('<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>\n\n')
        f.write('<!-- ' + indicatorsAsOneLine + ' -->\n')
        f.write('<!-- Fields specified: ' + whichFields + ' -->\n')
        f.write(etree.tostring(root, encoding='unicode', method='xml', pretty_print=True))

def getIndicatorsAsOneLine(indicatorDict):
    result = 'Section start fields that are read: '
    result = result + 'meta '
    for indicator in indicatorDict[0]:
        result = result + indicator + ' '
    result = result + ', patient 6100, treatment 6200, notes '
    for indicator in indicatorDict[3]:
        result = result + indicator + ' '
    result.replace('\n', " ").replace("  ", " ").replace(" ,", ",")
    return result + '\n'
                
#==============================================================
#============= E PRINT SOME TECHNICAL BACKGROUND ==============
#==============================================================

def getSetSorted(aSetOrDict): # method to increase easy-of-view
    return sorted(list(dict.fromkeys(aSetOrDict))) # list  
    
#==============================================================

def printTechnicalBackground(lines, fieldsToTake = None, printLeftover = False): # fieldsToTake = None: all fields

    isTreeAllFieldsWritten = False

    print(' Reading the data, into 4 sections (section fields are hardcoded in readDataReturnTree()) \n')
    # read tree with all fields: readDataReturnTree(lines, readAllFields=True) // do statistics on this all-data tree
    tree, unused, pIdsWithTreatSet, pIdsWithNoteSet, numTreatFoundInFile, numNotesFoundInFile, indicators = readDataReturnTree(lines, True)
    
    if(fieldsToTake is not None): # write out, additionally, the tree with all fields for information
        writeTree(tree, 'Tree_all_fields.xml', "all fields in the specified sections", getIndicatorsAsOneLine(indicators)) 
        isTreeAllFieldsWritten = True

    print('\n---------------------------------------------------------------------------------------')
    print('-- Metadata for this file  --------------------------------------------------------------')
    print('---------------------------------------------------------------------------------------\n')

    printMeta(tree.xpath('//meta')[0])
  
    print(' please note: a date of 1980-01-01 as the start date means (probably): \n The data start when the praxis-program was installed on the computer')
   
    
    print(' \n ----------------------------------------------------------------------------------------')
    print(' ! all fields are read (regardless of whether they are defined for use or not)')
    print(' ----------------------------------------------------------------------------------------\n\n')


    print('\n----------------------------------------------------------------------------------------')
    print('-- STATISTICS-----------------------------------------------------------------------------')
    print('----------------------------------------------------------------------------------------\n')

    numTotalLines = len(lines)
    numTotalLinesF = float(numTotalLines)

    numUsed = numTotalLines - len(unused)
    numUsedF = float(numUsed)
    numUnusedF = float(len(unused))

    print(' Input lines                            ' + str(numTotalLines)) # lines of input

    print(' total lines used                       ' + str(numUsed) + ' = ' + str(format((numUsedF/numTotalLinesF), '.2f')) +' %')
    print('\n   with indicator field for patient: 6100, treatment: 6200')
    print('\n   with indicator fields for meta:      ')
    print(indicators[0])
    print('\n   with indicator fields for note:      ')
    print(indicators[3])

    print(' -------------------------------------------')

 
    countPInTree, countTotalT, countTotalN, noOfPatientsWithoutTreatment, noOfPatientWithoutNote  = getTreeStatistics(tree)
 
    print(' ------------------------------------------- total information found:')
    
    print(' how many patients     :                ' + str(countPInTree) )
    print(' how many treatments   :                ' + str(numTreatFoundInFile) )
    print(' how many notes        :                ' + str(numNotesFoundInFile) )
    print(' -------------------------------------------')
    
    patientIdSet = set(tree.xpath('//patientId/text()'))

    print(' how many patients have treatment data: ' + str(len(pIdsWithTreatSet)) )
    print(' how many patients have notes:          ' + str(len(patientIdSet) - noOfPatientWithoutNote)) 
    print(' -------------------------------------------')
    print(' how many p., treatment data filled out:' + str(len(patientIdSet) - noOfPatientsWithoutTreatment))
    print(' how many patients have notes:          ' + str(len(patientIdSet) - noOfPatientWithoutNote)) 
    print(' how many patients have no notes:       ' + str(len(patientIdSet) - len(pIdsWithNoteSet)) )   
    print(' -------------------------------------------')
    print(' -------------------------------------------')


    print('\n\n----------------------------------------------------------------------------------------')
    print('-- FIELDS ------------------------------------------------------------------------------')
    print('----------------------------------------------------------------------------------------')

    pFieldSet, tFieldSet, nFieldSet = getTreeFields(tree) # after sections 6100, 6200, xxxx for note
 
    print('\n All fields found:\n')    
    print('\n Patient: ' + str(len(pFieldSet)) + '---------------------\n\n ')
    print(getSetSorted(pFieldSet))                               
    print('\n Treatment: ' + str(len(tFieldSet)) + '-------------------\n\n ')
    print(getSetSorted(tFieldSet))
    print('\n Note-----: ' + str(len(nFieldSet)) + '-------------------\n\n ')
    print(getSetSorted(nFieldSet))

    if(fieldsToTake is None):
        print(' All fields are read (in sections 6100, 6200, xxxx for notes, section fields in readDataReturnTree()). ')
    else:
        print(' Reading the data, into 4 sections (section fields in readDataReturnTree()), only defined fields. \n')
        tree, unused, pIdsWithTSet, pIdsWithNSet, numT, numN, indicators = readDataReturnTree(lines, False, fieldsToTake) # False

        pFieldSetAsDefined, tFieldSetAsDefined, nFieldSetAsDefined = getTreeFields(tree)
        fieldSetAsDefinedFound = pFieldSetAsDefined | tFieldSetAsDefined | nFieldSetAsDefined # set union
        
        print('\n Fields found:\n')    

        print('\n -- Fields missing, from those defined for extraction:\n')
        print(getSetSorted(set(fieldsToTake) - fieldSetAsDefinedFound))

        print('\n Patient: ------------------------------------------------\n')

        print('\n -- Fields found, from those defined for extraction:\n')
        print(getSetSorted(pFieldSetAsDefined))
   
        print('\n -- Fields additional, not defined for extraction:\n')
        print(getSetSorted(pFieldSet - set(fieldsToTake)))

        print('\n Treatment: ------------------------------------------------\n')
 
        print('\n -- Fields found, from those defined for extraction:\n')
        print(getSetSorted(tFieldSetAsDefined))

        print('\n -- Fields additional, not defined for extraction:\n')
        print(getSetSorted(tFieldSet - set(fieldsToTake)))

        print('\n Note: ------------------------------------------------\n')
 
        print('\n -- Fields found, from those defined for extraction:\n')
        print(getSetSorted(nFieldSetAsDefined))

        print('\n -- Fields additional, not defined for extraction:\n')
        print(getSetSorted(nFieldSet - set(fieldsToTake)))


    print('\n----------------------------------------------------------------------------------------')
    print('-- Background information completed----------------------------------------------------')
    print('----------------------------------------------------------------------------------------\n')
    print('.. programming instructions and background information in English language.\n')

    print('\nPlease note: this is a simple program to parse and process a BDT file. It is written in Python 2, without objects etc. - Changes can be made easily and nothing needs to be installed.\n\n\n\n\n')


    print('.. nun erfolgt das Einlesen auf Deutsch.\n')

    return isTreeAllFieldsWritten # add a confirmation

#==============================================================
#========================== MAIN ==============================
#==============================================================

def main():

    print('----------------------------------------------------------------------------------------')

    print('\nDas Programm starten: Bitte ein Terminal/eine Konsole öffnen. Dann bitte dorthin gehen, wo sich Datei bdt_extractor.py befindet (diese Datei)\n')
    print('Die einzulesende BDT-Datei muss in dem Dateiordner sein, in dem sich die Datei (bdt_extractor.py) befindet.\n')

    print('Die einzulesende BDT-Datei muss heissen: BDT.bdt \n')

    print('(Wenn die Datei einen anderen Namen hat, dann bitte die Datei umbennen (zum Namen BDT.bdt). Wenn man dies nicht möchte, kann der Name der Datei hier, im Programm bdt_extractor.py geändert werden)\n\n')

    print('Dann bitte in der Konsole schreiben:\n\npython bdt_extractor.py\n\n.. und das Einlesen startet.\n\nWenn der Fehler kommt: no such file, dann ist entweder der Name der Datei nicht BDT.bdt, oder die Datei ist woanders (die BDT-Datei muss in dem Dateiordner sein, wo auch bdt_extractor.py ist)\n')

    print('----------------------------------------------------------------------------------------\n')

    inFile = 'BDT.bdt' # << Hier bitte den Dateinamen ändern

    # ------------------------------------------------

    # ================================================
    doWriteTree = doWriteBackground = doWriteFields = True
    doWriteLeftover = False
    doWritePatientList = doWritePatientTreatmentNoteList = False
    doUseAllFields = doUseXdtBDTFieldsFromGNUmedFileDict = doUseXdtBDTFieldsFromCsvFile = False
    limitNumberOfPatients = 0  # 0: no limit
    fieldsCsvFile = None

    # ================================================
    # MINIMALISTIC CONFIG CONSOLE ==================== MINI CONFIG, MINI KONSOLE
    # ================================================

    consoleArgs = sys.argv[1:] # BTDFile x b a|g|c lpe 10 fields.csv 
  
    # default: BDT.bdt x b    # write tree, print background info, skip other config
    #
    # E.g.: BDT.bdt s s g lpe 10 // 'BDT.bdt' default 
    #
    # skip tree, skip background, use gnumed for lists, write leftover and lists, limit num-of output to 10

    if(len(consoleArgs) > 0): inFile = consoleArgs[0]
    if(len(consoleArgs) > 1): # 's' = skip the action
        if('s' in consoleArgs[1]): # x|s skip writing the tree in xml form
            doWriteTree = False
    if(len(consoleArgs) > 2):    
        if('s' in consoleArgs[2]): # b|s skip printing config background info
            doWriteBackground = False

    if(len(consoleArgs) > 3):      # a|g|c : use a custom field list for 'pe' (not for database tables)
        if('a' == consoleArgs[3]): # a: all fields, no restriction. Use gnumed fields for description (of fields.py, please see below **)
            doUseAllFields = doUseXdtBDTFieldsFromGNUmedFileDict = True
        if('g' == consoleArgs[3]): # g: gnumed fields (of the file fields.py, please see below **)
            doUseXdtBDTFieldsFromGNUmedFileDict = True
        elif('c' == consoleArgs[3] and len(consoleArgs) > 6): # c ignored if g. for c the custom field list must be specified (args==7)
            doUseXdtBDTFieldsFromCsvFile = True
            fieldsCsvFile = consoleArgs[6]

    if(len(consoleArgs) > 4): 
        if('l' in consoleArgs[4]): # l: write leftover lines
            doWriteLeftover = True
        if('p' in consoleArgs[4]): # p: write lists: p-data, e-xtended: p-t-n data to file
            doWritePatientList = True 
        if('e' in consoleArgs[4]): # e
            doWritePatientTreatmentNoteList = True
    
    if(len(consoleArgs) > 5):      # limit number of patients for list output (has no influence on writing database tables)
        if(int(consoleArgs[5]) > 0):
            limitNumberOfPatients = int(consoleArgs[5])
    if(len(consoleArgs) > 6):      # custom field list for c (args==3)
            True
    # ================================================
    # **: fields.py as: xdt_id_map   dict[field]=description
    # e.g. extract the dict here: https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py
    # it will be created if not found, in getxDTBDTFieldsFromGNUmed()
    # ================================================
    whichFieldsText = ''
    whichFieldsTextENTree = ''    
    whichFilesCreated = {}
    whichFilesExplanation = {}
    whichFilesCsvExplanationText = isGnumedUsed = False

    fields = None
    if(doUseXdtBDTFieldsFromGNUmedFileDict): # please note: custom fields are used for lists (not for the database tables)
        fields, isNewlyCreatedGnumedFieldsPy = getxDTBDTFieldsFromGNUmed()
        if(isNewlyCreatedGnumedFieldsPy):
            whichFilesCreated[0]='fields.py (GNUmed Felder)'
            whichFilesExplanation[0]='**'
        whichFieldsText = "GNUmed (fields.py), GPL v2"
        whichFieldsTextENTree = 'GNUmed BDT https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py, GPL v2'
        isGnumedUsed = True

    elif(doUseXdtBDTFieldsFromCsvFile):
        fields = getxDTBDTFieldsFromCsvFile(fieldsCsvFile)
        whichFieldsText = "Feldliste aus " + fieldsCsvFile
        whichFieldsTextENTree = "Fields from the file " + fieldsCsvFile
    else:
        fields = getxDTBDTFieldsDefaultAllSections() # this file
        whichFieldsText = "Felder, die in getxDTBDTFieldsDefaultAllSections() definiert wurden"
        whichFieldsTextENTree = "Fields that are specified in method getxDTBDTFieldsDefaultAllSections()"
   

    lines = readBDTFile(inFile)

    isTreeAllFieldsWritten = False
    if(doWriteBackground):
        if(doUseAllFields):
            isTreeAllFieldsWritten = printTechnicalBackground(lines, None, doWriteLeftover) # should be False
        else:
            isTreeAllFieldsWritten = printTechnicalBackground(lines, fields, doWriteLeftover) # should be True
        if(isTreeAllFieldsWritten):
            whichFilesCreated[50]='Tree_all_fields.xml'
            whichFilesExplanation[50]='XML-Baum. Mit den Feldern: Alle Felder wurden gelesen'           

    tree = None
    indicators = None
    if(doUseAllFields):
        tree, unused, pIdsWithTreatSet, pIdsWithNoteSet, numTreatFoundInFile, numNotesFoundInFile, indicators = readDataReturnTree(lines, True)
    else:
        tree, unused, pIdsWithTreatSet, pIdsWithNoteSet, numTreatFoundInFile, numNotesFoundInFile, indicators = readDataReturnTree(lines, False, fields)
    # print etree.tostring(tree, pretty_print=True)etree.tostring(root, encoding='unicode', method='text')

    if(doWriteTree):
        writeTree(tree, "Tree", whichFieldsTextENTree, getIndicatorsAsOneLine(indicators))
        whichFilesCreated[10]='Tree.xml          '
        whichFilesExplanation[10]='XML-Baum. Mit den Feldern: ' + whichFieldsText
 
    if(doWriteFields): # always True, please implement config if desired

        if(doUseXdtBDTFieldsFromGNUmedFileDict):
            writeFile('GNUmedFelder',returnFieldAndDescriptionAsLines(fields))
            writeFile('GNUm',replaceDangerousCharsForCsvInLines(returnFieldAndDescriptionAsLines(fields)), 'csv')
            whichFilesCreated[20]='GNUmedFelder.txt'
            whichFilesExplanation[20] = 'Zur Information'
            whichFilesCreated[30]='GNUm.csv          '
            whichFilesCsvExplanationText = True
            whichFilesExplanation[30] = '*, **'

        elif(doUseXdtBDTFieldsFromCsvFile):
            True # it exists. Anyhow, file Felder_verwendet.csv is written

        #elif(doUseDefaultFields_of_method_getxDTBDTFieldsDefaultAllSections) # No extra file. - Anyhow, file Felder_verwendet.csv is written

        writeFile('Felder_verwendet',replaceDangerousCharsForCsvInLines(returnFieldAndDescriptionAsLines(fields)), 'csv')
        whichFilesCreated[40]='Felder_verwendet.csv'
        whichFilesCsvExplanationText = True
        whichFilesExplanation[40]='Mit den Feldern: ' + whichFieldsText + '. *'        
   
    if(doWriteLeftover):
        writeFile('BDT_Datei_Zeilen_ungenutzt', unused)
        whichFilesCreated[60]='BDT_Datei_Zeilen_ungenutzt.txt'
        whichFilesExplanation[60]='Zeilen der BDT-Datei, für die Abschnitte und Felder nicht definiert wurden'
 

    print('\n----------------------------------------------------------------------------------------')
    print('-- WAS MACHT DIE DATEI bdt_extractor.py ?------------------------------------------------')
    print('----------------------------------------------------------------------------------------\n')

    print(' Diese Datei bdt_extractor.py enthält:\n\n * Eine Erklärung, wie eine BDT-Datei aufgebaut ist (auf Deutsch und auf Englisch), und \n * Quellenangaben, wo man dies nachlesen kann. \n\n * Dann eine Liste mit den Felder, die der bdt_extractor.py liest\n ** hier kann man Felder ergänzen oder wegnehmen. \n ** Die Reihenfolge der Daten wird so ausgeschrieben, wie dort aufgelistet;\n    das heisst: man kann die Reihenfolge der Felder, dort, ändern\n\n Dann folgt die Programmierung des \n a) Einlesens und Datenfindens, und \n\n b) des Schreibens der Information in einen XML-Baum und in (csv) Dateien; diese kann man z.B. in eine Datenbank einlesen; \n *  entsprechend des Datenbankschemas Patient:Behandlung 1:n, Patient:Abrechnungsnotiz 1:n\n\n Es werden einige Statistiken ausgeschrieben, und die gefundenen und verwendeten Felder, in englischer Sprache \n (bitte nach oben scrollen. Es wird über der Überschrift "WAS MACHT DIE DATEI" ausgegeben)\n ')

    print('----------------------------------------------------------------------------------------')
    
    print('\n .. Die Felder, die in einer BDT-Datei beschrieben sind, werden beschrieben im GNU-MED-PROJEKT.\n (Die Felder werden in diesem Programm, auf Wunsch, verwendet. Sie sind in der Datei fields.py beschrieben.)\n\n Das GNUmed-Projekt ist ein FOSS (free and open source, und somit auch kostenloses) Praxisverwaltungsprogramm.\n Im GNUmed-Programm kann man BDT-Dateien erstellen.\n (Im Internet kann man nachsehen, wie weit das GNUmed-Projekt bereits vorangekommen ist.)')

    print('\n----------------------------------------------------------------------------------------')
    print('-- DATEN LESEN -------------------------------------------------------------------------')
    print('----------------------------------------------------------------------------------------\n\n')

    print('Datei: ' + inFile + ' wird gelesen. Die Datei hat ' + str(len(lines)) + ' Zeilen')

    print('\n\nEs werden diese Abschnitte gelesen:')
    print('- Metadaten:')
    print(indicators[0])
    print('- Patient: 6100\n- Behandlung: 6200')
    print('- A-Notizen:')
    print(indicators[3])
    print('___(Die Abschnittsfelder können im Programmcode in readDataReturnTree() geändert werden.___')
    
    print('\n\nEs werden diese Felder gelesen: ' + whichFieldsText)
    

    print('\n\n-- Metadaten der Datei ' + inFile + ': ----------------------------------------------------\n')
 
    printMeta(tree.xpath('//meta')[0])

    print('\n Wenn bei "Zeitraum der Speicherung" 1.1.1980 als Startdatum angegeben ist, so bedeutet dies (vermutlich) \ndass die Daten beginnen mit dem Tag, da das Praxisprogramm auf diesem Computer installiert wurde\n')

    print('----------------------------------------------------------------------------------------')

    print('Es werden nun 3 Dateien erstellt: Patienten, Behandlungs- und Abrechnungsnotizen-Dateien.\n Diese Dateien können in eine Datenbank eingelesen werden. Die erstellten Dateien enthalten nur die Felder, die in dieser Datei (bdt_extractor.py), oben, definiert wurden.\n\n')


    print('Dies wurde in der BDT Datei gefunden:\n')


    # please note: Treatments and notes that did not contain data were not included in the tree -------------
   
    countPInTree, countTInTree, countNInTree, noOfPatientsWithoutAnyTreatment, noOfPatientWithoutAnyNote = getTreeStatistics(tree)

    print('Alle Patienten in der Datei:                                         ' + str(countPInTree))
    print('Alle Behandlungsdaten in der Datei:                                  ' + str(numTreatFoundInFile))
    print('Alle Abrechungsnotizen in der Datei:                                 ' + str(numNotesFoundInFile))

    patientIdSet = set(tree.xpath('//patientId/text()'))

    print('\nSo viele Patienten haben Behandlungsdaten:                           ' + str(len(pIdsWithTreatSet)) )

    print('  So viele Patienten haben Behandlungsdaten ausgefüllt:              ' + str(len(patientIdSet) - noOfPatientsWithoutAnyTreatment))

    print('\nSo viele Patienten haben Abrechnungsnotizen:                         ' + str(len(patientIdSet) - noOfPatientWithoutAnyNote))
    print('  Patienten ohne Abrechnungsnotizen:                                  ' + str(len(patientIdSet) - len(pIdsWithNoteSet)))


    patientTable = returnPatientTableForDatabase(tree)
    writeFile('Tabelle_Patient', patientTable, 'csv')

    treatmentTable = returnTreatmentTableForDatabase(tree)
    writeFile('Tabelle_Behandlung', treatmentTable, 'csv')

    notesTable = returnNotesTableForDatabase(tree)
    writeFile('Tabelle_Abrechnungsnotiz', notesTable, 'csv')

    whichFilesCreated[70]='Tabelle_Patient.csv/..Behandlung.csv/..Abrechnungsnotiz.csv'
    whichFilesExplanation[70]='Mit den Feldern: Felder, die in getxDTBDTFieldsDefault() definiert wurden'
 
    print('\n\n3 csv-Dateien wurden erstellt. \n')

    print('\n\n=============================================\nEine MINI KONSOLE bietet diese Eingabe:\n\n python bdt_extractor.py BDT.bdt x b g lpe 10    oder\n python bdt_extractor.py BDT.bdt x b c lpe 10 felder.csv\n\n Sie wird im Code unter mini config beschrieben.')
    print('\n ohne Angabe wird dies ausgeführt:          python bdt_extractor.py BDT.bdt x b    oder BDT.bdt x b s s')
    print('\n Ausgaben und Dateierstellung unterdrücken: python bdt_extractor.py BDT.bdt s s')
    print('\n Alle Ausgaben:                             python bdt_extractor.py BDT.bdt x b g lpe 10 \n (Verwendung von gnumed (g), und pe auf 10 Datensätze begrenzt)\n=============================================\n')

    print(' ERKLÄRUNG:\n')
    print(' Man kann das Programm einfach so ausführen:  python bdt_extractor.py   oder, man kann Angaben dazu setzen:\n')
    print('                         mit Angaben, z.B.:   python bdt_extractor.py BDT.bdt x b c lpe 10 felder.csv\n')
    print(' Als Eingabe ist möglich: BTDFile x|s b|s a|g|c lpe 10 fields.csv              // "|" bedeutet: entweder-oder\n')
    print(' 1. Stelle: BTDFile: diese Datei wird eingelesen. Wenn nichts angegeben ist, wird nach Datei: BDT.bdt, gesucht')
    print(' 2. Stelle: x: gefundene Daten werden als XML-Baum ausgeschrieben. s: Dies wird nicht getan.')
    print(' 3. Stelle: b: background: Information wird ausgegeben, so die Felder, die gefunden wurden. s: Dies wird nicht getan.')
    print(' 4. Stelle: a oder g oder c. a: alle Felder werden gesucht. g oder c: Felder zu suchen werden eingeschränkt, auf:')
    print('            g: auf Felder aus dem GNUmed Projekt, dazu wird eine Datei eingelesen: fields.py. Wenn die Datei fields.py')
    print('              nicht existiert, wird sie erstellt.')
    print('            c: auf eine (custom=selbstgewählte) Liste, daraus werden die Felder gelesen. Die Liste muss so geschrieben')
    print('              sein: in Zeilen "Field;Beschreibung". Diese Datei (z.B. fields.csv) muss angegeben werden.')
    print(' 5. Stelle: l, p, e: alles kann gesetzt sein. l: Zeilen aus der BDT-Datei, die nicht verwendet wurden, werden als Datei geschrieben')
    print('              (Daten werden nur verwendet, wenn sie bei Feldern stehen, die definiert wurden)')
    print('            p und e: Gefundene Information wird als Datei geschrieben. p = Daten zum Patienten,')
    print('            e = "extended": Daten zum Patienten, Behandlung, Abr-Notiz')
    print(' 6. Stelle: 10 bzw. eine Zahl: das gehört zur 5. Stelle: pe. Z.B. werden nur die ersten 10 gefundenen Patienten ausgeschrieben.')
    print(' 7. Stelle: felder.csv: das gehört zur 4. Stelle "c". Wenn c gesetzt ist, muss eine Datei hier angegeben werden.')
    print('              Wenn keine Datei angegeben wird, aber c gesetzt ist, dann werden die Felder, die in bdt_extractor.py, weiter oben,')
    print('              in getxDTBDTFieldsDefault(), definiert sind, genommen.\n')
    print(' 2. und 3.: s: "skip" - auslassen. Wenn an 2. und 3. Stelle ein s gesetzt ist, dann wird die Aktion nicht ausgeführt.\n')
    print(' Achtung, bitte: Die Reihenfolge und die Stelle bzw. Position der Eingabe müssen eingehalten werden. Wenn man zum Beispiel möchte,')
    print('             dass nur die übriggebliebenen Zeilen als Datei geschrieben werden, aber nicht der Baum, und auch keine ')
    print('             technische Information (background) ausgegeben wird;\n')
    print('             dann ist dies die Eingabe: BDT.bdt s s l\n=============================================\n\n\n')
  

    # start the output of lists using the mini config console: ----------------------------------------

    if(doWritePatientList):
        print('\n\nEine Liste mit einigen P-Daten zur Überprüfung wird erstellt. Es wurden diese Felder gelesen: ' + whichFieldsText)

        fieldsToTake = None
        if(doUseXdtBDTFieldsFromGNUmedFileDict or doUseXdtBDTFieldsFromCsvFile):
            fieldsToTake = fields
        writePatientList(tree, "neu_Liste_P", limitNumberOfPatients, fieldsToTake)
        whichFilesCreated[80]='Liste-P.txt   '
        whichFilesExplanation[80]='Mit den Feldern: ' + whichFieldsText

 
    if(doWritePatientTreatmentNoteList):
        print('\n\nEine Liste mit einigen P-B-A-Daten zur Überprüfung wird erstellt. Es wurden diese Felder gelesen: ' + whichFieldsText)

        fieldsToTake = None
        if(doUseXdtBDTFieldsFromGNUmedFileDict or doUseXdtBDTFieldsFromCsvFile):
            fieldsToTake = fields
        writePatientTreatmentNoteList(tree, "neu_Liste_PBA", limitNumberOfPatients, fieldsToTake)
        whichFilesCreated[90]='Liste-P-B-A.txt'
        whichFilesExplanation[90]='Mit den Feldern: ' + whichFieldsText

    
    if(len(whichFilesCreated) > 0):
        theFileKeys = whichFilesCreated.keys()
        theFileKeysSorted = sorted(theFileKeys)

        print('\n\nDateien wurden erstellt: (mit dem Präfix "_neu")')
        for key in theFileKeysSorted:
            print('- ' + whichFilesCreated[key])
    
        print('\n---\nErklärung: (Dateien wurden erstellt mit dem Präfix "_neu")\n')
        for key in theFileKeysSorted:
            print('- ' + whichFilesCreated[key] + '\t\t' + whichFilesExplanation[key])

        if(whichFilesCsvExplanationText):
            print('---\n*   Die Datei kann zum Einlesen verwendet werden (Mini Konsole: "c"). In der Datei kann man Felder ergänzen')

        if(isGnumedUsed):
            print('\n**  Die Felder des BDT-Formats werden im GNUmed Projekt beschrieben.\n    https://github.com/ncqgm/gnumed/blob/master/gnumed/gnumed/client/business/gmXdtMappings.py. Die Dateien des GNUmed Projekts\n    unterliegen der GPL v2 Lizenz. - Daraus wurde ein Teil verwendet und in der Datei fields.py gespeichert.')


    print('\n\n(Mit einer MINI KONSOLE kann man das Programm steuern: Felder verwenden, Dateien ausgeben. Bitte scrollen Sie in diesem Fenster nach oben, um mehr darüber zu erfahren)')
    

    print('----------------------------------------------------------------------------------------')


if __name__ == "__main__":
    main()