Import line listings
Now that all files where downloaded and we where sure they contained no errors, We decided to import them into a PostgreSQL database in order to make the data from the reports available for further, detailed analyses. We wrote a python version 3 script with various routines to perform checks and migrate data into a relational structure. In the script we made sure that all data except the links to the website would be processed and would end up in the database. For convenience, all data retrieved from the downloaded files was put in a schema called “import” in the database. The database schema contains 2 general tables; import.products and import.substances that contain the product or substance code followed by a product or substance name. As the CSV files only contain codes, we can later use these tables to get the “human readable name” of products or substances. The import.substance
and import.product tables
contain the individual reports where the codes for the product or the substance have been added in the process, plus an indicator about the “seriousness” as this is a filter parameter that was available while requesting the line listings, but was not available in the resulting CSV file. The CSV field for reaction has been split in individual records resulting in the table import.substance_reaction
and import.product_reaction
. We used a similar splitting process to create the tables import.substance_subject drug_list
, import.product_subject_drug_list
and import.substance_concomitant_drug_list
, import.product_concomitant_drug_list
and import.substance_literature
and import.product_literature
resulting in rows containing references that can later be used for further research but have not been processed further in any of our examples. With the script that created the database, we managed to store all information available from the CSV files in the database. From this database we generated the file ema.dump
using the following command:
pg_dump -Fc -Z 9 -U superset --file=~/ema.dump ema
from this dump, a database can be constructed using:
pg_restore -U postgres -Fc -j 8 --no-owner -d ema ~/ema.dump
this will take between 2 and 15 minutes depending on the speed of your computer and the settings of postgresql
More information and instructions on installing a database can be found on the PostgreSQL documentation website in particular the documentation on pg_dump and pg_restore.
Source code for importing line listings
The script that copies the content of the CSV files into the PostgreSQL database is called the OpenEma CSV reader for which you can view and download the required source files from the GitLab repository. If you are a software developer and/or are interested in checking python version 3 code, we welcome you to check the source code and help us detect any mistakes.