Introduction

The European Medicines Agency (EMA) is a decentralised agency of the European Union (EU). EMA hosts a system called EudraVigilance used for managing and analysing information on suspected adverse reactions to medicines.

A restricted set of data elements in EudraVigilance is available and provided to healthcare professionals, patients and the general public. To allow for a more user friendly experience OpenEMA downloaded the data.

OpenEMA is a PRIVATE entity, not affiliated with EMA. In this documentation we describe how data was downloaded from EMA and subsequently put in a database. We then show example analyses that were run against the database using Standard Query Language (SQL). Finally we create visualisations for data and trends using the open source version of Metabase.

For example; the top-10 reported substances represented in a bar chart.

Or substance reactions visualised in a line graph.

This project is meant to share the data and help you, our audience, to reproduce our examples and kickstart a community of amateur and professional (data) scientists to use this database and inform each other and other professionals and the public on what information is available and what the information can tell us.

About the data

We downloaded line listings for Individual Case Safety Reports (ICSR, shortened throughout this documentation as report) and processed them into a PostgreSQL database. PostgreSQL can be queries with SQL that enables users to ask questions such as "Select the sum of all records from a table of data where column X has a date value of Y". PostgreSQL has a beginner tutorial available that explains the possibilities of using SQL.

Community

We strongly encourage everyone to share ideas and insights and open a broad discussion on the value of the EMA Eudravigilance data and the information available within the data. We welcome everyone to the OpenEMA community.

To enable this goal, the data and this documentation is made available under the CC-BY-NC-SA license.

Keep in mind, the examples are just that, examples. They can (and will) contain (minor) errors due to the fact that reports spanning multiple products or substances are not always correctly queried in the examples. We can use help to make sure the examples are checked, validated and corrected where necessary.

This documentation

The documentation is maintained with mdbook. The source code for this website is on gitlab.

Pre-compiled binaries

Executable binaries are available for download on the GitHub Releases page. Download the binary for your platform (Windows, macOS, or Linux) and extract the archive. The archive contains an mdbook executable which you can run to build your books.

To make it easier to run, put the path to the binary into your PATH.

Serve

You can run the documentation as a website using:

mdbook serve -p 3001 --open

Database

We use PostgreSQL as database.

How we get line listings

Line listings are "The data and information provided at substance level is in the form of an electronic Reaction Monitoring Report (eRMR) containing aggregated data and a line listing with details of the individual cases." (see: glossary).

At first, we started to follow the links in the internet pages and clicked every download individually. We then renamed all downloaded files and used automated routines to process them into a database for further investigation. The processing of the files became easy, but the downloading of them resulted in a lot of manual labour. We discovered that because of the large number of available ICSR’s, this would probably take years.

So we took a good look using the web browsers “development tools” to see what data was being processed by the javascript functions in the various web pages. This lead to the discovery of the link that produces line listing files at the url https://dap.ema.europa.eu/analytics/saw.dll?Go where we detected that a set of variables was constantly changing as per individual substance or product using a set of query parameters in a HTTP GET request. We then set out to directly call this url with the parameters for substance or products. This seemed to work but a lot of files we downloaded seemed broken and contained warnings in javascript/html when we inspected them.

The error reported was often that a maximum number of records was reached or a timeout took place. So we wanted to find a way to download smaller line listings which would not generate an error. To achieve this, we inspected the query parameters available and tested various ways of setting or unsetting them. We managed to use some to filter the result, f.i. we could download a single year by setting a parameter "Line Listing Objects”.”Gateway Year” “eq” “2020” which indeed resulted in records for the year of 2020 only. We kept at this until we had defined a couple of interchangeable filters that where usable for both product and substance line listings and made sure we stayed within the maximum number of records. Our scripts, written in python version 3, is able to download line listings as they are available via the portal using the filter options (which are also available through the portal as select options in visual elements) for further automated processing on a personal computer.

We used two different scripts for the download.

One script that parses the web pages that show maps/tables giving information about the number of cases per product or substance per country. This script generated the CSV files available in the file EMA-countries-2023-06-20.zip these files where uploaded into the database’s public schema as the tables public.substance_cases_per_country and public.product_cases_per_country.
Another script parsed the webpages that generate “line listings” resulting in CSV files containing detailed information about ICSR’s. The result are the files available in the files EMA-products-2023-06-20.zip and EMA-substances-2023-06-20.zip.

All scripts are written in python version 3 and available on request. We chose not to disclose the scripts used for downloading line listings publicly, but on request to make sure people running these scripts know what they are doing as we do not want to stress the EMA web application.

Import line listings

Now that all files where downloaded and we where sure they contained no errors, We decided to import them into a PostgreSQL database in order to make the data from the reports available for further, detailed analyses. We wrote a python version 3 script with various routines to perform checks and migrate data into a relational structure. In the script we made sure that all data except the links to the website would be processed and would end up in the database. For convenience, all data retrieved from the downloaded files was put in a schema called “import” in the database. The database schema contains 2 general tables; import.products and import.substances that contain the product or substance code followed by a product or substance name. As the CSV files only contain codes, we can later use these tables to get the “human readable name” of products or substances. The import.substance and import.product tables contain the individual reports where the codes for the product or the substance have been added in the process, plus an indicator about the “seriousness” as this is a filter parameter that was available while requesting the line listings, but was not available in the resulting CSV file. The CSV field for reaction has been split in individual records resulting in the table import.substance_reaction and import.product_reaction. We used a similar splitting process to create the tables import.substance_subject drug_list, import.product_subject_drug_list and import.substance_concomitant_drug_list, import.product_concomitant_drug_list and import.substance_literature and import.product_literature resulting in rows containing references that can later be used for further research but have not been processed further in any of our examples. With the script that created the database, we managed to store all information available from the CSV files in the database. From this database we generated the file ema.dump using the following command:

pg_dump -Fc -Z 9 -U superset --file=~/ema.dump ema

from this dump, a database can be constructed using:

pg_restore -U postgres -Fc -j 8 --no-owner -d ema ~/ema.dump

this will take between 2 and 15 minutes depending on the speed of your computer and the settings of postgresql

More information and instructions on installing a database can be found on the PostgreSQL documentation website in particular the documentation on pg_dump and pg_restore.

Source code for importing line listings

The script that copies the content of the CSV files into the PostgreSQL database is called the OpenEma CSV reader for which you can view and download the required source files from the GitLab repository. If you are a software developer and/or are interested in checking python version 3 code, we welcome you to check the source code and help us detect any mistakes.

Clean

After importing the line listings there are duplicate records in the database in various tables. The following SQL script makes sure all tables in the import schema contained unique reports and supporting tables only.

Remove duplicates from substance

CREATE TABLE import.substance_unique AS
SELECT
  substance_id,
  local_number,
  report_type,
  report_date,
  qualification,
  country,
  age_group_lower_months,
  age_group_upper_months,
  age_group_reporter,
  is_child_report,
  sex,
  serious
FROM import.substance GROUP BY 
  substance_id,
  local_number,
  report_type,
  report_date,
  qualification,
  country,
  age_group_lower_months,
  age_group_upper_months,
  age_group_reporter,
  is_child_report,
  sex,
  serious;
  
DROP TABLE import.substance;

ALTER TABLE import.substance_unique
  RENAME TO substance;

Remove duplicates from substance_concomitant_drug_list

CREATE TABLE import.substance_concomitant_drug_list_unique AS
SELECT
  substance_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route
FROM import.substance_concomitant_drug_list GROUP BY 
  substance_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route;
DROP TABLE import.substance_concomitant_drug_list;

ALTER TABLE import.substance_concomitant_drug_list_unique
  RENAME TO substance_concomitant_drug_list;

Remove duplicates from substance_subject_drug_list

CREATE TABLE import.substance_subject_drug_list_unique AS
SELECT
  substance_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route
FROM import.substance_subject_drug_list GROUP BY 
  substance_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route;

DROP TABLE import.substance_subject_drug_list;

ALTER TABLE import.substance_subject_drug_list_unique
  RENAME TO substance_subject_drug_list;

Remove duplicates from substance_reaction

CREATE TABLE import.substance_reaction_unique AS
SELECT
  substance_id,
  local_number,
  report_date,
  reaction,
  duration,
  outcome,
  seriousness_criteria
FROM import.substance_reaction GROUP BY 
  substance_id,
  local_number,
  report_date,
  reaction,
  duration,
  outcome,
  seriousness_criteria;
DROP TABLE import.substance_reaction;
ALTER TABLE import.substance_reaction_unique RENAME TO substance_reaction;

Remove duplicates from substance_literature

CREATE TABLE import.substance_literature_unique AS
SELECT
  substance_id,
  local_number,
  report_date,
  literature
FROM import.substance_literature GROUP BY 
  substance_id,
  local_number,
  report_date,
  literature;
DROP TABLE import.substance_literature;
ALTER TABLE import.substance_literature_unique RENAME TO substance_literature;

Remove duplicates from product

CREATE TABLE import.product_unique AS
SELECT
  product_id,
  local_number,
  report_type,
  report_date,
  qualification,
  country,
  age_group_lower_months,
  age_group_upper_months,
  age_group_reporter,
  is_child_report,
  sex,
  serious
FROM import.product GROUP BY 
  product_id,
  local_number,
  report_type,
  report_date,
  qualification,
  country,
  age_group_lower_months,
  age_group_upper_months,
  age_group_reporter,
  is_child_report,
  sex,
  serious;
 DROP TABLE import.product;
 ALTER TABLE import.product_unique RENAME TO product;

Remove duplicates from product_concomitant_drug_list

CREATE TABLE import.product_concomitant_drug_list_unique AS
SELECT
  product_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route
FROM import.product_concomitant_drug_list GROUP BY 
  product_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route;
 DROP TABLE import.product_concomitant_drug_list;
 ALTER TABLE import.product_concomitant_drug_list_unique
   RENAME TO product_concomitant_drug_list;

Remove duplicates from product_subject_drug_list

CREATE TABLE import.product_subject_drug_list_unique AS
select
  product_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route
FROM import.product_subject_drug_list GROUP BY 
  product_id,
  local_number,
  report_date,
  drugs,
  characteristic,
  indication,
  action,
  duration,
  dose,
  route;
 DROP TABLE import.product_subject_drug_list;
 ALTER TABLE import.product_subject_drug_list_unique
   RENAME TO product_subject_drug_list;

Remove duplicates from product_reaction

CREATE TABLE import.product_reaction_unique AS
SELECT
  product_id,
  local_number,
  report_date,
  reaction,
  duration,
  outcome,
  seriousness_criteria
FROM import.product_reaction GROUP BY 
  product_id,
  local_number,
  report_date,
  reaction,
  duration,
  outcome,
  seriousness_criteria;
 DROP TABLE import.product_reaction;
 ALTER TABLE import.product_reaction_unique RENAME TO product_reaction;

Remove duplicates from product_literature

CREATE TABLE import.product_literature_unique AS
SELECT
  product_id,
  local_number,
  report_date,
  literature
FROM import.product_literature GROUP BY 
  product_id,
  local_number,
  report_date,
  literature;
DROP TABLE import.product_literature;
ALTER TABLE import.product_literature_unique RENAME TO product_literature;

Content

Running the following script will generate metrics. When you run this script on your database, it should match the metrics from the database dump called ema.dump that should accompany this document.

WITH tbl AS
  (SELECT table_schema,
          TABLE_NAME
   FROM information_schema.tables
   WHERE TABLE_NAME NOT LIKE 'pg_%'
     AND table_schema IN ('public', 'import'))
SELECT table_schema,
       TABLE_NAME,
       (xpath('/row/c/text()', query_to_xml(
           format(
               'SELECT count(*) as c from %I.%I',
               table_schema, TABLE_NAME),
               FALSE, TRUE, ''))
       )[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;

Result of running the script to generate metrics:

table_schema	table_name	rows_n
import	substance_reaction	45951332
import	substance_subject_drug_list	32594176
import	substance_concomitant_drug_list	21117169
import	substance	13457327
import	product_reaction	10881192
import	product_subject_drug_list	6453406
import	product_concomitant_drug_list	5932744
import	product	3369026
import	substance_literature	1499182
import	product_literature	109405
public	substance_cases_per_country	30979
public	product_cases_per_country	11759
import	substances	4333
import	products	1469

Now that we have cleaned all duplicates from all tables, we proceed to look into the data to perform analyses.

Analysis

This chapter contains a number of analyses that are given as examples on how to process and access the data. We strongly encourage you to run these examples as they will give insight the integrity of the data you are working with. If your database generates different results or you detect errors, please inform us. When you create and perform your own analyses, please share your code and outcome preferably under the CC-BY-NC-SA license.

Reports

In this chapter we perform some example analyses on reports.

Unique Reports

We want to know about unique reports. The key identifying value of a report is the local number. We asumed these to be unique but reports can be updated or even deleted over time. We noticed that when updates take place, another report with the same local number can appear in the downloads. this can be across multiple line listings so it is a bit hard to detect when using the EMA portal, but we discovered they are present. Besides that, reports are overlapping between substance line listings and product line listings. Our first course of action is to grab all local number's and isolate them in a table. We use a table to increase performance later on when we want to access unique reports again.

Add a table schema

Because we do not want to polute the public and import schema's, we create a new schema to create derived tables.

CREATE SCHEMA analyses;

Number of reports

First, we select all unique reports from the substance table and create a new table with the results.

CREATE TABLE analyses.icsr_temp AS 
  SELECT
    local_number,
    max(report_date) AS report_date,
    max(country) AS country 
  FROM import.substance
  GROUP BY local_number;

This should render 10187759 results

Then we insert the unique reports from the product table into the table generated in the previous query.

INSERT INTO analyses.icsr_temp 
  SELECT
    local_number,
    max(report_date) AS report_date,
    max(country) AS country 
  FROM import.product
  GROUP BY local_number;

Which should add another 3106773 into the icsr table.

Then we copy the reports into a table that merges the unique reports from the substance table and the product table so it cannot contain double records.

CREATE TABLE analyses.icsr AS
  SELECT
    local_number,
    max(report_date) AS report_date
  FROM analyses.icsr_temp
  GROUP BY local_number;

This should render 10210452 results.

We can now drop the temporary table.

DROP TABLE analyses.icsr_temp;

Let's see how much unique reports remain.

The final result is that we have 10210452 unique reports in the analyses.icsr table.

EMA reports 25.3 million Case Safety Reports (that represent 15 million cases) in the following document.

21 March 2023, EMA/900566/2022, H-Division, 2022 Annual Report on EudraVigilance for the European Parliament, the Council and the Commission, Reporting period: 1 January to 31 December 2022.

What we found is that the resulting table contains 10210452 rows, approx. 10 million unique reports.

Unique reports per year

We can also count the number of unique reports per year running the following query.

SELECT
  extract(year from date_trunc('year', report_date)) AS "year",
  count(*)
FROM analyses.icsr
GROUP BY date_trunc('year', report_date);

This should render:

year	count
2002	35
2003	11053
2004	19512
2005	77431
2006	124033
2007	157358
2008	465759
2009	331834
2010	244258
2011	289149
2012	415608
2013	479409
2014	485101
2015	479275
2016	452669
2017	592893
2018	732772
2019	761480
2020	675657
2021	1475566
2022	1518932
2023	420668

(22 rows)

From this table we can see that the database starts to really receive reports in 2003, reaching a tenfold of reports from 2006, seeing a steep rise in 2008 and eventually reaching 100 times the number of reports from the early 2000's in 2021 declining again in 2022 and given our reports were downloaded in July, our careful estimate is that this decline will probably continue in 2023.

Report versions

With a simple query, it is easy to see that reports when only looking at their local_number have multiple versions. Let's select the top 10 reports with the most versions.

SELECT
  local_number,
  count(*)
FROM import.substance
GROUP BY local_number
ORDER BY count(*) DESC LIMIT 10;

local_number	count
EU-EC-2950025	133
EU-EC-10015259316	123
EU-EC-10015258987	106
EU-EC-12370933	100
EU-EC-10014572231	99
EU-EC-10010813941	98
EU-EC-10015268466	94
EU-EC-10015163949	94
EU-EC-10015261376	94
EU-EC-10014947659	93

(10 rows)

Can a report span multiple days?

Let's check:

SELECT
  local_number,
  COUNT(DISTINCT report_date)
FROM import.substance
GROUP BY local_number
ORDER BY COUNT(DISTINCT report_date) DESC LIMIT 10;

local_number	count
EU-EC-1000000	1
EU-EC-10000000	1
EU-EC-10000000011	1
EU-EC-10000000016	1
EU-EC-10000000019	1
EU-EC-10000000021	1
EU-EC-10000000022	1
EU-EC-10000000024	1
EU-EC-10000000025	1
EU-EC-10000	1

(10 rows)

it seems it can't.

So what is causing multiple versions of a report?

We find that the report with local_number "EU-EC-2950025" has 133 versions. We will look at this report in more detail. First of all, we check if our previous conclusion that the date does not change holds.

SELECT
  local_number,
  report_date,
  COUNT(*)
FROM import.substance
WHERE
  local_number = 'EU-EC-2950025'
GROUP BY
  local_number,
  report_date;

local_number	report_date	count
EU-EC-2950025	2009-07-13	133

It does, all the versions are from exactly the same date (day). Let's look at the is_child_report boolean to see if at least one of the versions is marked as a parent.

SELECT
  local_number,
  report_date,
  is_child_report,
  COUNT(*)
FROM import.substance
WHERE
  local_number = 'EU-EC-2950025'
GROUP BY
  local_number,
  report_date,
  is_child_report;

It turns out they are all child reports.

local_number	report_date	is_child_report	count
EU-EC-2950025	2009-07-13	t	133

Could it have something to do with the substance_id?

SELECT
  local_number,
  report_date,
  substance_id,
  COUNt(*)
FROM
  import.substance
WHERE local_number = 'EU-EC-2950025'
GROUP BY
  local_number,
  report_date,
  is_child_report,
  substance_id;

local_number	report_date	substance_id	count
EU-EC-2950025	2009-07-13	14621	1
EU-EC-2950025	2009-07-13	14623	1
EU-EC-2950025	2009-07-13	14640	1
.....	.....	.....	......
EU-EC-2950025	2009-07-13	76884	1
EU-EC-2950025	2009-07-13	201161	1
EU-EC-2950025	2009-07-13	228030	1

(133 rows) (truncated for readability)

As the table shows: It does. This report spans 133 substances. We can conclude for this particular report that it is a report that is related to multiple substances. Our scripts to get line listings from EMA loops all substance of product identifiers and thus will return a list of reports per substance and product. This means that where the local_number should be the so called primary key across our database, it is impossible to use it that way in the downloading process but the import.product and import.substance tables should have a UNIQUE(substance_id, local_number) combination.

Do we have unique combinations of id and local_number?

Let's see if the combination product_id and local_number is unique:

ALTER TABLE import.product ADD CONSTRAINT
  product_unique_local_number_product_id
UNIQUE (product_id, local_number);

If this is the case, we should not get an error. It is indeed true. The output is

ALTER TABLE

We repeat the process for the substance table

ALTER TABLE import.substance ADD CONSTRAINT
  substance_unique_local_number_substance_id
UNIQUE (substance_id, local_number);

Again, we should not get an error, so the output should be

ALTER TABLE

We can now say that the product and substance tables have the integrity we want, that all the records are unique for the combination between the identifier of the substance(substance_id) or product(product_id) and the report identifier (local_number)

Reports per age group

It will be interesting to see how the age groups correlate. The original line-listings contain several age group definitions that we split up and stored as integers representing the number of months for the lower and upper age group boundaries (when given) because the smallest age group we found is "0 to 2 months" and we wanted to make sure we can detect that age group as well.

So, what age groups do we have and what can we tell about them in relation to reports? Let's take a look at the substance table.

SELECT 
  age_group_lower_months/12 age_from,
  age_group_upper_months/12 age_to,
  COUNT(DISTINCT local_number)
FROM import.substance
GROUP BY
  age_group_lower_months,
  age_group_upper_months
ORDER BY COUNT(DISTINCT local_number) DESC;

age_from	age_to	count
18	64	4769137
		2231876
65	85	2216231
85		315824
0	2	211532
3	11	204186
12	17	203213
0	0	35760

We can see that most reports are for the age group 18 to 64 years, followed by unspecified, followed by 65 to 85 years.

SELECT
  age_group_lower_months/12 age_from,
  age_group_upper_months/12 age_to,
  (age_group_upper_months - age_group_lower_months)/12 AS "range",
  COUNT(DISTINCT local_number)
FROM import.substance
GROUP BY age_group_lower_months, age_group_upper_months
ORDER BY age_group_lower_months DESC;

age_from	age_to	range	count
			2216231
85			315824
65	85	20	2231876
18	64	46	4769137
12	17	5	211532
3	11	8	204186
0	2	1	203213
0	0	0	35760

Ranges of the age groups are not equal. This makes them incomparable as f.i. the age group 18 to 64 years ranges 46 years while other groups are way smaller.

Reports per country

We decided to investigate, as an example, what the distribution is of the reports per country. For this we used public.substance_cases_per_country and public.product_cases_per_country these tables contain the country, the product/substance code and the number of cases for which it is unclear if these are reports or another undisclosed property either generated, aggregated or summarized from reports or other, undisclosed information. We use a query to summarize content of the tables.

SELECT
  sum(value)
FROM public.substance_cases_per_country;

Result: 6641886 cases. This is roughly 64% of the unique number of reports we found earlier.

SELECT
  sum(value)
FROM public.product_cases_per_country;

Result: 1351575 cases.

Together these are roughly 8 of the 10 million records that are publically exposed through EMA, and only 8 of the15 million cases as stated in the EMA annual report towards the European commission. These numbers therefor have to be taken with a grain of salt, but we can check if there is a trend in what is publicly available.

get the top 10 countries for substance

SELECT
 country,
 sum(value),
 floor(sum(value)/(
    SELECT
      sum(value)
    FROM public.substance_cases_per_country
  )*100)::numeric::integer AS percentage
FROM public.substance_cases_per_country
GROUP BY country
ORDER BY sum DESC LIMIT 10;

country	sum	percentage
France	1404436	21
Germany	1271894	19
Italy	791529	11
United Kingdom	522275	7
Netherlands	498157	7
Spain	365283	5
Sweden	281317	4
Austria	245752	3
Poland	164281	2
Denmark	132578	1

For substances, France is contributing ~21% of all cases, Germany ~19%, Italy ~11%, United Kingdom and the Netherlands both ~7%, Spain ~5%, Sweden ~4%, Austria ~3%, Poland ~2%, and the rest of the countries less then 2%. We roughly conclude that five countries contribute ~63% of all cases for substances.

Get the top 10 countries for product

select
  country, sum(value),
  floor(sum(value)/(
    select
      sum(value)
    from public.product_cases_per_country
  )*100)::numeric::integer as percentage
from public.product_cases_per_country
GROUP BY country order by sum desc limit 10;

country	sum	percentage
Germany	276184	20
France	265276	19
Italy	182996	13
United Kingdom	140469	10
Netherlands	75558	5
Spain	74248	5
Sweden	43867	3
Belgium	30437	2
Austria	35716	2
Greece	30392	2

For products, Germany is contributing ~20% of all cases, France ~19%, Italy ~13%, United Kingdom ~10%, Netherlands and Spain both ~5%, Sweden ~3%, Austria, Greece, Belgium ~2% and the rest of the countries less then 2%. We roughly conclude that four countries contribute ~62% of all cases for products.

It is strange that between product cases and substance cases there are discrepancy in the percentages which slightly alters the sequence of the “leader board”. But roughly most contributions are by France/Germany, Italy, UK/Netherlands, Spain.

So there seem to be four to five countries that produce more then 60% of the total number of cases that are reported out of the total of 27 in the European Economic Area.

We do however want to state that absolute numbers should be related to population numbers as bigger countries can f.i. have more health professionals and more reporting but lower distribution when it comes to the size of the population. in further study these numbers could be correlated with the UN or World Bank population numbers. This however, is outside the scope of this example.

Products

Top 10 products

select
  a.product_id,
  b.name,
  count(distinct a.local_number)
from import.product a 
  join import.products b 
  on a.product_id = b.product_id 
GROUP BY 
  a.product_id,
  b.name
order by 
  count(distinct a.local_number) desc
limit 10;

product_id	name	count
77936	REVLIMID	157288
45328	ENBREL	108091
58535	REMICADE	76957
58830	HUMIRA	73708
176663	ELIQUIS	64894
86304	PRADAXA	51304
47388	LYRICA	49658
58467	AVASTIN	39258
49090	PLAVIX	34943
12129169	KEYTRUDA	34024

Product details

The top count substance REVLIMID is a cancer medicine, the second most reported ENBREL is an anti-inflammatory drug. Let's take a look at some of the other products in detail. As an example we will take two products, Keytruda, also a cancer medicine and Inflectra another anti-inflammatory medicine.

Inflectra

Let's look for their product_id's:

SELECT * FROM
  import.products
WHERE
  name ILIKE '%flectra';

This gives us one results

product_id	name
435328	INFLECTRA

We will use the product_id's to dig deeper for information.

Seriousness

SELECT
  product_id,
  seriousness_criteria,
  COUNT(distinct local_number)
FROM import.product_reaction
WHERE product_id = 435328
GROUP BY
  product_id,
  seriousness_criteria
ORDER BY COUNT(distinct local_number )desc limit 10;

product_id	seriousness_criteria	count
435328	Other Medically Important Condition	11125
435328		11101
435328	Caused/Prolonged Hospitalisation	2439
435328	Caused/Prolonged Hospitalisation, Other Medically Important Condition	1833
435328	Results in Death	148
435328	Life Threatening	116
435328	Results in Death, Other Medically Important Condition	102
435328	Disabling	78
435328	Life Threatening, Caused/Prolonged Hospitalisation	54
435328	Disabling, Other Medically Important Condition	44

(10 rows)

Reporting range

We want to know when the first and the last reports where issued

SELECT
  product_id,
  min(report_date),
  max(report_date),
  count(distinct local_number)
FROM import.product_reaction
WHERE product_id = 435328
GROUP BY product_id;

product_id	min	max	count
435328	2014-08-20	2023-06-11	15106

Which shows that the first report on Inflectra was issued in 2014 and the last recently. It also shows that a total of 15106 reports on this product have been filed.

SELECT
  product_id,
  extract(year from date_trunc('year', report_date)) AS "year",
  count(distinct local_number)
FROM import.product_reaction
WHERE product_id = 435328
GROUP BY product_id, date_trunc('year',report_date);

product_id	year	count
435328	2014	10
435328	2015	142
435328	2016	383
435328	2017	613
435328	2018	933
435328	2019	1263
435328	2020	1650
435328	2021	1896
435328	2022	2811
435328	2023	5405

(10 rows)

We see a year over year rise in the number of reports then a rise of reports that almost doubles in 2023 and given the database dates from July 2023, a probably even steeper increase.

Keytruda

Let's look for their product_id's:

SELECT * FROM
 import.products
WHERE
 name ILIKE '%truda';

This gives us one results

product_id	name
12129169	KEYTRUDA

We will use the product_id's to dig deeper for information.

Seriousness

SELECT
  product_id,
  seriousness_criteria,
  COUNT(distinct local_number)
FROM import.product_reaction
WHERE product_id = 12129169
GROUP BY 
  product_id,
  seriousness_criteria
ORDER BY COUNT(distinct local_number) desc limit 10;

product_id	seriousness_criteria	count
12129169	Other Medically Important Condition	14657
12129169		6453
12129169	Caused/Prolonged Hospitalisation, Other Medically Important Condition	5917
12129169	Caused/Prolonged Hospitalisation	5462
12129169	Results in Death, Other Medically Important Condition	2725
12129169	Results in Death, Caused/Prolonged Hospitalisation, Other Medically Important Condition	1213
12129169	Results in Death	747
12129169	Life Threatening, Caused/Prolonged Hospitalisation, Other Medically Important Condition	620
12129169	Results in Death, Life Threatening, Caused/Prolonged Hospitalisation, Other Medically Important Condition	314
12129169	Caused/Prolonged Hospitalisation, Disabling, Other Medically Important Condition	255

Reporting range

We want to know when the first and the last reports where issued

SELECT
  product_id,
  min(report_date),
  max(report_date),
  count(distinct local_number)
FROM import.product_reaction
WHERE product_id = 12129169
GROUP BY product_id;

product_id	min	max	count
12129169	2014-10-13	2023-06-11	34024

Which shows that the first report on Keytruda was issued in 2014 and the last recently. It also shows that a total of 34024 reports on this product have been filed.

SELECT
  product_id,
  extract(year from date_trunc('year', report_date)) AS "year",
  count(distinct local_number)
FROM import.product_reaction
WHERE product_id = 12129169
GROUP BY product_id, date_trunc('year',report_date);

product_id	year	count
12129169	2014	35
12129169	2015	317
12129169	2016	1101
12129169	2017	1896
12129169	2018	4066
12129169	2019	5167
12129169	2020	4055
12129169	2021	4843
12129169	2022	7054
12129169	2023	5490

We see a year over year rise in the number of reports with a dip in 2020, then a rise of reports that almost doubles in 2022 and given this database was created in July 2023 a possible rise in 2023 as well.

Product literature

To see if we can get valuable information from the literature, we can use the PostgreSQL text search features functionality.

We can gather statistics about the most used words. We use "english" as the dictionary, this will remove common words like "the", "and", "is" from the results.

SELECT word, ndoc
   FROM ts_stat($$
     SELECT to_tsvector('english', literature)
     FROM import.product_literature AS t
  $$)
  where LENGTH(word) > 4
  order by
    ndoc desc,
    LENGTH(word) desc
  limit 10;

word	ndoc
journal	28961
patient	28746
societi	24586
japanes	22833
therapi	18066
treatment	17515
cancer	15571
annual	15029
clinic	14086
literatur	12188

(10 rows)

We can also search for literature references containing one ore more specific words, f.i. "covid". let's do that and find the 10 most referenced literature with the number of reports referencing them. As we will see, it is hard to detect any pattern, but there are more advanced ways to seek through text in PostgreSQL and we encourage everyone that is seriously interested in literature used in reports to look at the PostgreSQL documentation on text search.

select
  trim(literature),
  COUNT(DISTINCT local_number)
from import.product_literature
where
  to_tsvector('simple', literature) @@ to_tsquery('simple', 'covid')
GROUP BY trim(literature)
order by COUNT(DISTINCT local_number) desc limit 10;

Literature	count
Lopez G, Valero Zanuy M, Barrios I, et.al.. Acute hypertriglyceridemia in patients with covid-19 receiving parenteral nutrition. Nutrients. 2021;13 (7):.	29
Hoek A S R, Manintveld C O, Betjes G H M, Hellemons E M, Seghers L, Van Kampen A A J, et al.. COVID-19 in solid organ transplant recipients: a single-center experience. Transplant International. 2020;33:1099–1105	21
Elikowski W, Fertala N, Zawodna-Marszalek M, Rajewska-Tabor J, Swidurski W, Wajdlich D, Marszalek A, Skrzywanek P, Pyda M, Zytkiewicz M. [Marked self-limiting sinus bradycardia in COVID-19 patients not requiring therapy in the intensive care unit(https://pubmed.ncbi.nlm.nih.gov/34464372/)]	12
Aouba A, Baldolli A, Geffray L, Verdon R, Bergot E, Martin-Silva N, et al. Targeting the inflammatory cascade with anakinra in moderate to severe COVID-19 pneumonia: case series. Ann Rheum Dis. 2020;1-2.	9
Beyls C, Martin N, Hermida A, Et.al. Lopinavir-Ritonavir Treatment for COVID-19 Infection in Intensive Care Unit. Circulation: Arrhythmia and Electrophysiology. 2020 AUG;13(E008798):862-865.	9
Saez-Gimenez B, Berastegui C, Barrecheguren M, Revilla-Lopez E, Los Arcos I, Alonso R, Aguilar M et.al.. COVID-19 in lung transplant recipients: A multicenter study.. Am J Transplant.. 2021;21:1816-24	9
Omidi N, Forouzannia SK, Poorhosseini H, Tafti S.H.A, Salehbeigi S, Lotfi-Tokaldany M. Prosthetic heart valves and the COVID-19 pandemic era: What should we be concerned about?. Journal of cardiac surgery. 2020;35(10) 2500-5. doi:10.1111/jocs.14707	8
Elec A, Elec F, Oltean M, et. al.. COVID-19 after kidney transplantation: Early outcomes and renal function following antiviral treatment. Int J Infect Dis. 2021;104:426-432.	8
Giacchetti G.; Salvio G.; Gianfelice C. et al.. Remote management of osteoporosis in the first wave of the COVID-19 pandemic. Archives of Osteoporosis. 2022;17 (37):1-9	8
Havlir, D.. 32nd Annual Medical Management of HIV/AIDS and COVID-19. Live Stream Conference. 2020;unk:unk	7

Substances

Top 10 substances

select
  a.substance_id,
  b.name,
  count(distinct a.local_number)
from import.substance a 
  join import.substances b 
  on a.substance_id = b.substance_id 
GROUP BY 
  a.substance_id,
  b.name
order by 
  count(distinct a.local_number) desc
limit 10;

substance_id	name	count
42325700	COVID-19 MRNA VACCINE PFIZER-BIONTECH (TOZINAMERAN)	1034456
40995439	COVID-19 VACCINE ASTRAZENECA (CHADOX1 NCOV-19)	445801
40983312	COVID-19 MRNA VACCINE MODERNA (ELASOMERAN)	375783
50950	LENALIDOMIDE	165351
15756	INFLIXIMAB	124291
15535	ETANERCEPT	121607
20353	METHOTREXATE	110786
23893	ACETYLSALICYLIC ACID	108986
128670	CALCIUM CHLORIDE, SODIUM CHLORIDE, GLUCOSE, SODIUM LACTATE, MAGNESIUM CHLORIDE	106046
19980	LEVONORGESTREL	102116

(10 rows)

Segregate by substance and region

For this example, we want to check metrics of the number of reports per region and differentiate between COVID-19 vaccination substances and other substances.

We need to answer a couple of questions to proceed: Can we filter Economic Area and others? For this we use the country column and group and count all the substance and product table entries.

Checking the country

SELECT 
  country,
  COUNT(DISTINCT local_number)
from import.substance GROUP BY country;

country	count
European Economic Area	5107493
Non European Economic Area	5078995
Not Specified	1271

SELECT
  country,
  COUNT(DISTINCT local_number)
from import.product GROUP BY country;

country	count
European Economic Area	1240840
Non European Economic Area	1865899
Not Specified	34

We see that most reports are received from Non European Economic Area for both substances and products. Given that the EMA is a European database, this is remarkable.

Checking for substances with covid in the name

Now, can we differentiate records that are reported for a substance that has covid in it's name?

The query

SELECT
  COUNT(DISTINCT local_number)
FROM import.substance
WHERE 
  substance_id = ANY(ARRAY(
    SELECT substance_id FROM import.substances
    WHERE name ILIKE '%covid%'
  ));

produces 1862731

and

SELECT
  COUNT(DISTINCT local_number)
FROM import.substance
WHERE 
  substance_id NOT IN (
    SELECT substance_id FROM import.substances 
    WHERE name ILIKE '%covid%'
  );

produces 8338644

When we add these two results together, we get 10201375, but we need to keep in mind that reports can have multiple versions in the database: there are duplicates in the column for local_number as reports can have various versions. This needs to be taken into account. So will our sum equal the total of all unique reports in the table?

SELECT
  COUNT(DISTINCT local_number)
from import.substance;

This query produces 10187759 records. There are multiple substance_id's for a report and reports can end up in the results for having a substance with covid in the name involved combined with other substances. Differentiation between substances with covid in the name and others is possible although the results overlap given that the substance_id can fall in both of the categories we are investigating. Remember, they are examples and results should be used carefully before drawing conclusions. These queries require further investigation.

Creating the required subsets

First, we create a table that groups by the local_number, representing the unique ICSR and stack all values for substance_id, report_date and country for later use.

CREATE TABLE
 analyses.unique_substance_icsr as 
  select
   local_number,
   array_agg(distinct substance_id) substance_ids,
   array_agg(distinct report_date) as report_dates,
   array_agg(distinct country) as regions
from import.substance GROUP BY local_number;

produces 10187759 results.

from this table, we then create a new table containing all reports that are related to a substance with covid in the name:

CREATE TABLE analyses.covid_icsr as
  select
    local_number,
    report_dates[1],
    regions[1],
    array_length(substance_ids,1) as substance_count 
  from analyses.unique_substance_icsr
  WHERE
    substance_ids && ARRAY(
      SELECT substance_id::numeric::integer FROM import.substances
      WHERE name ILIKE '%covid%'
    )
  order by array_length(substance_ids, 1);

produces 1862731results

A similar query can be used to invert the selection by selecting all substance_id's where the substance_id does not contain the term covid. This way, we get the results we define as others:

CREATE TABLE analyses.non_covid_icsr as
  select
    local_number,
    report_dates[1],
    regions[1],
    array_length(substance_ids,1) as substance_count
  from analyses.unique_substance_icsr
  WHERE NOT substance_ids && ARRAY(
    SELECT substance_id::numeric::integer FROM import.substances
    WHERE name ILIKE '%covid%'
  )
  order by array_length(substance_ids, 1);

produces 8325028 results

This gives us the base for determining the positions numbered 1 to 4 in the matrix:

	COVID-19	Other
European Economic Area	1	2
Other	3	4

Segment 1

European Economic Area, substances with covid in the name

CREATE TABLE
  analyses.eea_covid as
    select
      report_dates,
      count(*)
    from
      analyses.covid_icsr
    where regions = 'European Economic Area'
    GROUP BY report_dates order by report_dates;

produces 911 (days on which reports are filed or altered)

Segment 2

European Economic Area, other substances

CREATE TABLE 
  analyses.eea_other as 
    select
      report_dates,
      count(*)
    from analyses.non_covid_icsr 
    where regions = 'European Economic Area'
    GROUP BY report_dates order by report_dates;

produces 6918 (days on which reports are filed or altered)

Segment 3

Other Area, substances with covid in the name

CREATE TABLE
  analyses.non_eea_covid as
    select
      report_dates,
      count(*)
    from analyses.covid_icsr
    where regions != 'European Economic Area'
    GROUP BY report_dates order by report_dates;

produces 865 (days on which reports are filed or altered)

Segment 4

Other Area, other substances

CREATE TABLE
  analyses.non_eea_other as
    select
      report_dates,
      count(*)
    from analyses.non_covid_icsr
    where regions != 'European Economic Area'
    GROUP BY report_dates order by report_dates;

produces 7029 (days on which reports are filed or altered)

Now we can perform a query to ask the database; on what date were most reports recieved on substances with covid in their name?

SELECT * from analyses.eea_covid order by count desc limit 1;

Which results in:

report_dates	count
2022-06-16	15608

Showing that most reports on substances with covid in their name where received on the 16th of July in 2022.

Can you think of some more queries?

Top 10 reactions

Reactions

Let's look at the reaction that are present to find the reactions with most reports.

select
  reaction,
  count(*)
from
  import.substance_reaction
GROUP BY reaction   
order by count(*) desc limit 10;

reaction	count
Pyrexia	840351
Headache	831209
Fatigue	752054
Nausea	656244
Drug ineffective	564964
Dyspnoea	481568
Diarrhoea	433502
Arthralgia	430680
Rash	429196
Dizziness	427772

Let's do another check and find the reactions with most reports and segregate by outcome

select
  reaction,
  outcome,
  count(*)
from
  import.substance_reaction
GROUP BY reaction, outcome
order by count(*) desc limit 10;

reaction	outcome	count
Pyrexia	Recovered/Resolved	418006
Drug ineffective		402969
Off label use		302933
Headache	Recovered/Resolved	300059
Death	Fatal	292108
Fatigue	Not Recovered/Not Resolved	248051
Nausea	Recovered/Resolved	241111
Fatigue		208296
Headache	Not Recovered/Not Resolved	201347
Nausea		191709

It is interesting to see that Death which has the outcome Fatal enters the top 10.

Fatal outcome

Let's find a top 10 with Fatal outcome.

select
 reaction,
 outcome,
 count(*)
from
 import.substance_reaction
where
 outcome ilike '%fatal%'
GROUP BY reaction, outcome
order by count(*) desc limit 10;

reaction	outcome	count
Death	Fatal	292108
Completed suicide	Fatal	81474
Toxicity to various agents	Fatal	67831
Cardiac arrest	Fatal	34689
Pneumonia	Fatal	27633
Sepsis	Fatal	27031
Drug abuse	Fatal	26598
Malignant neoplasm progression	Fatal	23101
Multiple organ dysfunction syndrome	Fatal	22958
Respiratory failure	Fatal	21990

Death is not always fatal

As extra analyses, we group the reports where the reaction contains the word death we assume not all reporting has been entered correct as it seems possible to have death as a reaction where the outcome is not equal to "Fatal". An interesting combination of reaction and outcome is "Recovered/Resolved" from "Foetal death". Foetal death is a synonym for "still birth", the death of a foetus needs not be fatal for the mother, whilst it is for the unborn child.

select
  reaction,
  outcome,
  count(*)
from
  import.substance_reaction
where
  reaction ilike '%death%'
GROUP BY reaction, outcome
order by count(*) desc limit 25;

reaction	outcome	count
Death	Fatal	292108
Sudden death	Fatal	9284
Near death experience		2844
Death	Not Specified	2456
Foetal death		2341
Sudden cardiac death	Fatal	2090
Sudden infant death syndrome	Fatal	1850
Foetal death	Fatal	1824
Death		1786
Death neonatal	Fatal	1652
Brain death	Fatal	1428
Fear of death		1198
Death	Not Recovered/Not Resolved	1171
Accidental death	Fatal	917
Cardiac death	Fatal	809
Foetal death	Recovered/Resolved	754
Apparent death	Recovered/Resolved	681
Sudden cardiac death		641
Cell death		562
Near death experience	Recovered/Resolved	560
Cell death	Recovered/Resolved	544
Sudden death		497
Fear of death	Recovered/Resolved	464
Sudden unexplained death in epilepsy	Fatal	461
Foetal death	Not Recovered/Not Resolved	446

(25 rows)

Top 10 reactions for Tozinameran

The number one substance we found earlier was Tozinameran. To find the top 10 reactions with their outcome for the Tozinameran substance, we first lookup the substance_id:

SELECT 
  *
from import.substances
where name ilike '%TOZINAMERAN%';

substance_id	name
42325700	COVID-19 MRNA VACCINE PFIZER-BIONTECH (TOZINAMERAN)
60123556	COVID-19 MRNA VACCINE PFIZER-BIONTECH ORIGINAL/OMICRON BA.1 (TOZINAMERAN, RILTOZINAMERAN)
60141237	COVID-19 MRNA VACCINE PFIZER-BIONTECH ORIGINAL/OMICRON BA.4-5 (TOZINAMERAN, FAMTOZINAMERAN)

Let's select substance_id 42325700 and see in the reaction table for substances which reactions are related to this substance_id, we will group them by reaction and outcome to get an general idea and limit the result to 10.

select
  reaction,
  outcome,
  count(*)
from
  import.substance_reaction
where substance_id = 42325700
GROUP BY substance_id, reaction, outcome
order by count(*) desc limit 10;

reaction	outcome	count
COVID-19		98917
Vaccination failure		88821
Headache	Recovered/Resolved	71871
Pyrexia	Recovered/Resolved	69366
Fatigue	Not Recovered/Not Resolved	49876
Headache	Not Recovered/Not Resolved	49474
Fatigue	Recovered/Resolved	47029
Myalgia	Recovered/Resolved	43603
Chills	Recovered/Resolved	40605
Headache	Recovering/Resolving	34880

In 98917 reports, the reaction was COVID-19. In 88821cases the vaccination failed.

Substance suspect drug list

Select the top 10 of drugs reported on for substances for the suspect drug list table. This table is accidentally named subject instead of suspect.

select
  drugs,
  COUNT(DISTINCT local_number)
from import.substance_subject_drug_list
GROUP BY drugs 
order by COUNT(DISTINCT local_number) desc
limit 10;

drugs	count
COMIRNATY [TOZINAMERAN]	904812
COVID-19 VACCINE ASTRAZENECA	337828
SPIKEVAX [COVID-19 MRNA VACCINE MODERNA	316281
VAXZEVRIA [COVID-19 VACCINE ASTRAZENECA	204072
REVLIMID [LENALIDOMIDE]	155625
TOZINAMERAN [TOZINAMERAN]	117032
ENBREL [ETANERCEPT]	101998
[CLOZAPINE]	88094
[LEVONORGESTREL]	82590
REMICADE [INFLIXIMAB]	73876

(10 rows)

Links have been added to the entries in the EPAR information from the EMA website.

"COMIRNATY contains tozinameran, a messenger RNA" as stated here and we find TOZINAMERAN as a substance in the substances list too.

It is interesting to see that most reports are related to a COVID-19 drug, namely the first four entries plus the sixth. From the top-10, the sum of all reports related a COVID-19 drug totals 1880025 if we relate this to the total number of unique reports: 10210452, about 20% of all reports are COVID-19 medicine related.

Another interesting medicine in the top ten list of most reported drugs, is REVLIMID which is a medicine used for the treatment of certain cancers and serious conditions affecting blood cells and bone marrow, namely multiple myeloma, myelodysplastic syndromes and mantle cell and follicular lymphoma.

Substance concomitant drug list

Select the top 10 of drugs reported on for substances for the concomitant drug list table.

select
  drugs,
  COUNT(DISTINCT local_number)
from import.substance_concomitant_drug_list
GROUP BY drugs 
order by COUNT(DISTINCT local_number) desc
limit 10;

drugs	count
[PARACETAMOL]	213671
[ACETYLSALICYLIC ACID]	205282
[OMEPRAZOLE]	132309
[PREDNISONE]	129979
[LEVOTHYROXINE SODIUM]	127684
[FUROSEMIDE]	126576
[SIMVASTATIN]	118803
[METFORMIN, METFORMIN HYDROCHLORIDE, METFORMIN HYDROCHLORIDE BP]	106566
[ATORVASTATIN]	101405
[FOLIC ACID]	101267
(10 rows)

(10 rows)

It seems from the suspect drug list that medicine names have no square brackets in their name and that the content between square brackets is a substance. Paracetamol comes from a lot of different brands and maybe the EMA decided to group them and omit the name. This can however not be concluded from the data.

Advanced

Menstrual issues

A recent strong increase in heavy menstrual bleeding, was mentioned in the British Medical Journal (BMJ). It is unclear if EMA has looked into this based on their data, what happens when we do it?

select
  reaction,
  count(distinct local_number)
from import.product_reaction 
where reaction ilike '%menstru%' 
group by reaction 
order by count(distinct local_number) desc limit 10;

We can see that related to products, there are 1920 reports mentioning Heavy menstrual bleeding as (one of the) reaction(s).

reaction	count
Heavy menstrual bleeding	1920
Intermenstrual bleeding	1100
Menstrual disorder	1043
Menstruation irregular	957
Menstruation delayed	819
Premenstrual syndrome	91
Premenstrual pain	18
Menstrual discomfort	17
Menstruation normal	10
Menstrual cycle management	6

select
 reaction,
 count(distinct local_number)
from import.substance_reaction 
where reaction ilike '%menstru%' 
group by reaction 
order by count(distinct local_number) desc limit 10;

reaction	count
Heavy menstrual bleeding	48794
Menstrual disorder	30349
Intermenstrual bleeding	26426
Menstruation irregular	25405
Menstruation delayed	18877
Premenstrual syndrome	2325
Menstrual discomfort	1336
Premenstrual pain	921
Menstrual cycle management	300
Menstruation normal	294

For substances it is 48794reports.

We want to see if any substances stick out.

select a.name, b.* from 
(select
   --reaction,
   substance_id,
   count(distinct local_number) count
 from import.substance_reaction 
 where reaction = 'Heavy menstrual bleeding'
 group by substance_id, reaction) b
 JOIN import.substances a on a.substance_id = b.substance_id
 order by b.count desc limit 10;

name	substance_id	count
COVID-19 MRNA VACCINE PFIZER-BIONTECH (TOZINAMERAN)	42325700	25570
COVID-19 MRNA VACCINE MODERNA (ELASOMERAN)	40983312	7791
COVID-19 VACCINE ASTRAZENECA (CHADOX1 NCOV-19)	40995439	5062
LEVONORGESTREL	19980	3405
ETONOGESTREL	18832	1634
DESOGESTREL	18500	383
ETHINYLESTRADIOL, ETONOGESTREL	39209	347
LACTOSE	25479	288
DROSPIRENONE, ETHINYLESTRADIOL	59494	242
COVID-19 MRNA VACCINE PFIZER-BIONTECH ORIGINAL/OMICRON BA.1 (TOZINAMERAN, RILTOZINAMERAN)	60123556	212

It seems that, given that "Heavy menstrual bleeding" totals 48794 reports for the substance table and 38423 reports from that same table are related to COVID-19 vaccines, we feel that this should be something of concern. This appears to be something that justifies further research, also by EMA.

Product, Substance, Drug

Throughout the database, we find products and substances but drugs are also mentioned. We are curious about the overlap and the differences and would like to see if we can make head or tails from the different types.

We have tables for products and substances. Is it possible to make a table that contains both and add all entries found for drugs as well? Let's first look at the unique entries in the drug list tables.

SELECT
  'product_suspect_drug_list' as source,
  count(distinct drugs)
FROM
  import.product_subject_drug_list
UNION
SELECT
  'product_concomitant_drug_list' as source,
  count(distinct drugs)
FROM
  import.product_concomitant_drug_list
UNION
SELECT
  'substance_suspect_drug_list' as source,
  count(distinct drugs)
FROM
  import.substance_subject_drug_list
UNION
SELECT
  'substance_concomitant_drug_list' as source,
  count(distinct drugs)
FROM
  import.substance_concomitant_drug_list
UNION
SELECT
  'products' as source,
  count(*)
FROM
  import.products
UNION
SELECT
  'substances' as source,
  count(*)
FROM
  import.substances;

source	count
substances	4333
substance_suspect_drug_list	39350
product_suspect_drug_list	18960
product_concomitant_drug_list	34939
substance_concomitant_drug_list	65836
products	1469

As we can see there is a big difference between the amount of products, substances and drugs.

The drug lists contains textual information which seems to be a combination of medicine (not always present as product and sometimes omitted) and substances between square brackets.

We are going to try to get everything into a new table to see if we can find any relations.

CREATE TABLE analyses.temp_names as
  SELECT distinct drugs as "words" from import.product_subject_drug_list;
insert into analyses.temp_names
  SELECT distinct drugs from import.substance_subject_drug_list;
insert into analyses.temp_names
  SELECT distinct drugs from import.product_concomitant_drug_list;
insert into analyses.temp_names
  SELECT distinct drugs from import.substance_concomitant_drug_list;
insert into analyses.temp_names
  SELECT distinct name from import.products;
insert into analyses.temp_names
  SELECT distinct name from import.substances;
create table analyses.name as select distinct words from analyses.temp_names;
drop table analyses.temp_names;

SELECT count(*) from analyses.name;

count
84990

We find that the result of getting the distinct set of names, words or whatever we should call this is less then the sum of all found words from all tables that contain names or descriptions for substance, product or drug. Can we find intersections and relations?

At least we now have one long list to search through. Let's do a little experiment.

Mentions of a product

select * from analyses.name where words ilike '%comirnaty%';

words
COMIRNATY 10 MICROGRAMS/DOSE CONCENTRATE FOR DISPERSION FOR INJECTION COVID-19 MRNA VACCINE
COMIRNATY 10 MICROGRAMS/DOSE CONCENTRATE FOR DISPERSION FOR INJECTION [TOZINAMERAN]
COMIRNATY 10 MICROGRAMS/DOSE [TOZINAMERAN]
COMIRNATY 30 MICROGRAMS/DOSE CONCENTRATE FOR DISPERSION FOR INJECTION COVID-19 MRNA VACCINE
COMIRNATY 30 MICROGRAMS/DOSE CONCENTRATE FOR DISPERSION FOR INJECTION [TOZINAMERAN]
COMIRNATY 30 MICROGRAMS/DOSE DISPERSION FOR INJECTION COVID-19 MRNA VACCINE
COMIRNATY 30 MICROGRAMS/DOSE DISPERSION FOR INJECTION [TOZINAMERAN]
COMIRNATY 30 MICROGRAMS/DOSE [TOZINAMERAN]
COMIRNATY 3 MICROGRAMS/DOSE CONCENTRATE FOR DISPERSION FOR INJECTION COVID-19 MRNA VACCINE
COMIRNATY CONCENTRATE FOR DISPERSION FOR INJECTION 10 MICROGRAMS/DOSE [TOZINAMERAN]
COMIRNATY CONCENTRATE FOR DISPERSION FOR INJECTION 30 MICROGRAMS/DOSE [TOZINAMERAN]
COMIRNATY CONCENTRATE FOR DISPERSION FOR INJECTION COVID-19 MRNA VACCINE
COMIRNATY CONCENTRATE FOR DISPERSION FOR INJECTION [TOZINAMERAN]
COMIRNATY COVID-19 MRNA VACCINE
COMIRNATY DISPERSION FOR INJECTION [TOZINAMERAN]
COMIRNATY ORIGINAL/OMICRON BA.1
COMIRNATY ORIGINAL/OMICRON BA.1 DISPERSION FOR INJECTION
COMIRNATY ORIGINAL/OMICRON BA.1 [TOZINAMERAN, RILTOZINAMERAN]
COMIRNATY ORIGINAL/OMICRON BA.4-5
COMIRNATY ORIGINAL/OMICRON BA.4-5 COVID-19 MRNA VACCINE
COMIRNATY ORIGINAL/OMICRON BA.4-5 [TOZINAMERAN, 5'-CAPPED MRNA ENCODING SARS-COV-2, OMICRON VARIANTS BA.4 AND BA.5, SPIKE PROTEIN, PRE-FUSION STABILISED
COMIRNATY ORIGINAL/OMICRON BA.4-5 [TOZINAMERAN, FAMTOZINAMERAN]
COMIRNATY [TOZINAMERAN]

As seen, there are products and substances mentioned in the drug lists, but also information about the dose and sometimes even unrelated terms.

Language specific information

One of the things we do not seem to be able to extract from the EMA database is the country a report originates from. But looking at the created analyses.name table shows something interesting. There is information in the words that are language specific. If we look for instance for the text per dag which is Dutch for per day, we find 3 rows.

select * from analyses.name where words ilike '%per dag%';

words
1000MG 3X PER DAG [NOT AVAILABLE]
DE PIL 30 MG PER DAG [NOT AVAILABLE]
KOELZALF 3X PER DAG OP DE HUID AANBRENGEN; [NOT AVAILABLE]

Another particular Dutch word is prik let's see if we can find that too:

select * from analyses.name where words ilike '%prik%';

words
6DE PRIK DT [NOT AVAILABLE]
BOOSTRIX POLIO INJSUSP WWSP 0,5ML / 4DE PRIK DKTP/HIB/HEPATITIS B [NOT AVAILABLE]
EEN PRIK IN MIJN SCHOUDER TEGEN DE SLIJMBEURSONTSTEKING [NOT AVAILABLE]
RUGGENPRIK [NOT AVAILABLE]
RUGGEPRIK [NOT AVAILABLE]
TETANUS PRIK [NOT AVAILABLE]

Here we also discover the Dutch word tegen meaning against. Let's look for that too.

select * from analyses.name where words ilike '%tegen%';

words
2 GANGBARE MEDICIJNEN TEGEN HOGE BLOEDDRUK. [NOT AVAILABLE]
AANSTIPMIDDEL TEGEN ACNE [NOT AVAILABLE]
DIVERSE PUFJES TEGEN MIJN COPD [NOT AVAILABLE]
EEN PRIK IN MIJN SCHOUDER TEGEN DE SLIJMBEURSONTSTEKING [NOT AVAILABLE]
GENEESMIDDEL TEGEN HOOIKOORTS EN ANDERE ALLERGIEKLACHTEN [NOT AVAILABLE]
IETS TEGEN MISSELIJKHEID [NOT AVAILABLE]
MIDDEL TEGEN BOEZEM FIBRILEREN [NOT AVAILABLE]
MIDDEL TEGEN TRAGE SCHILDKLIER [NOT AVAILABLE]
ONTSTEKINGSREMMER TEGEN ARTRITIS PSORIATICA [NOT AVAILABLE]
PUFJE TEGEN ASTMA [NOT AVAILABLE]
TEGEN DE MISSELIJKHEID EN TEGEN EEN ALLERGISCHE REACTIE [NOT AVAILABLE]
ZALFJES TEGEN PSORIASIS [NOT AVAILABLE]

Although we cannot discover in detail from which country a report originates, we can determine that the Dutch language is used in specific reports. Maybe this method can also be used to investigate other languages?

Word cloud

To get insight in the most used words, we generate a table that can be used to create a word cloud.

WITH words AS (
    SELECT unnest(string_to_array(lower(words), ' ')) AS word
    FROM analyses.name
)
SELECT word, count(*) as frequency  FROM words
GROUP BY word order by count(*) desc limit 25;

word	frequency
[not	16887
available]	16871
acid,	5725
sodium	5111
not	4488
hydrochloride,	4425
available	4292
for	3875
mg	3647
chloride,	3021
calcium	2874
hydrochloride]	2603
	2424
magnesium	2259
nos]	1974
solution	1913
potassium	1850
unspecified	1850
chloride	1742
medication	1633
nos,	1558

Maybe we should clean up special characters like ] and , but for now this will give us an idea.

We generate a file containing all words from the previous query.

COPY (SELECT unnest(string_to_array(lower(words), ' ')) AS word
    FROM analyses.name) TO '/var/lib/postgres/wordcloud.csv' (format CSV);

Uploading this file to Free World Cloud Generator renders a nice word cloud.

Word cloud

If we ignore the word "available" for now, it is interesting to see words like (hydro)chloride, aluminium and acid and another Dutch word: "pollen".

Can you detect the most used medicine in the word cloud?

You now should have a general idea about what is available by gathering textual information and stitching it together. Can you think of more advanced uses?

Ivermectine

EMA advises against the use of Ivermectine for COVID-19 prevention treatment.

"Although ivermectin is generally well tolerated at doses authorised for other indications, side effects could increase with the much higher doses that would be needed to obtain concentrations of ivermectin in the lungs that are effective against the virus. Toxicity when ivermectin is used at higher than approved doses therefore cannot be excluded."

Wouldn't it be good to investigate what side effects (reactions) have been reported in EudraVigilance for Ivermectine? This chapter gives some examples of what questions could be asked to the database. Can you come up with more questions? Is Ivermectine, a product or a substance? Is it in any of the drug lists?

Is it a product?

select * from import.products where name ilike '%iverm%';

product_id	name

(0 rows)

Ivermectine is not found.

Is it a substance?

select * from import.substances where name ilike '%iverm%';

substance_id	name
23261	IVERMECTIN

(1 row)

Results in one row and a substance_id. When we check the substance table, we find 3015 unique reports for this substance.

select
  'substance reports' as reported_on,
  count(distinct local_number)
from import.substance where substance_id = 23261;

reported_on	count
substance reports	3015

(1 row)

Comparing outcome

What outcomes do the reports on the Ivermectine substance have?

select
  outcome,
  count(distinct local_number)
from
  import.substance_reaction
where substance_id = 23261 group by outcome;

outcome	count
Fatal	282
Not Recovered/Not Resolved	461
Not Specified	4
Recovered/Resolved	1661
Recovered/Resolved With Sequelae	24
Recovering/Resolving	350
	931

How does this compare to f.i. the outcomes for the "COVID-19 MRNA VACCINE PFIZER-BIONTECH (TOZINAMERAN)"" substance?

select
  outcome,
  count(distinct local_number)
from
  import.substance_reaction
where substance_id = 42325700 group by outcome;

outcome	count
Fatal	13723
Not Recovered/Not Resolved	354461
Recovered/Resolved	357564
Recovered/Resolved With Sequelae	27910
Recovering/Resolving	214203
	351899

We know that there are 3015 reports on the Ivermectine substance. We know there are 1034542 reports on the Tozinameran substance with the substance_id 42325700. What is the percentage of "Fatal" outcomes for each of these substances?

Ivermectine: 282 / 3015 = 0.0935.. ~ 9%
Tozinameran: 13723 / 1034542 = 1.3264.. ~ 1%

And for "Recovered/Resolved"?

Ivermectine: 1661 / 3015 = 0.5509.. ~ 55%
Tozinameran: 357564 / 1034542 = 1.3264.. ~ 34%

We have not looked at the dates of the reports and maybe we are over generalizing. But it is interesting to break this research down further. Can you think of more advanced questions you want answered in comparing substances?

Can we find Ivermectine in the drug lists?

Is Ivermectine mentioned in any of the drug lists? Let's find out and by joining all drug_list counts for the unique number of reports.

select 
  'product'as class,
  'suspect' as list_type,
  count(distinct local_number) as count
from import.product_subject_drug_list
where drugs ilike '%iverm%'
UNION
select 
  'product'as class,
  'concomitant' as list_type,
  count(distinct local_number) as count
from import.product_concomitant_drug_list
where drugs ilike '%iverm%'
UNION
select 
  'substance'as class,
  'suspect' as list_type,
  count(distinct local_number) as count
from import.substance_subject_drug_list
where drugs ilike '%iverm%'
UNION
select 
  'substance'as class,
  'concomitant' as list_type,
  count(distinct local_number) as count
from import.substance_concomitant_drug_list
where drugs ilike '%iverm%';

class	list_type	count
substance	suspect	2993
substance	concomitant	695
product	suspect	51
product	concomitant	136

There are mentions in the drug lists. It would be good to relate them back to substances and products.

Visualisation

Showing query results as tables in a document only works up to a certain extent. When you want to display large datasets or data evolving over time, a chart or graph may offer better insight.

So we searched the internet to find the right tool that would quickly offer us the ability to create visual enhanced insight. A lot of tooling is available, but few of them are open source, which is one of the requirements we had. We also wanted to be as "cross-platform" as possible, a tool that can run in a browser would be great.

We settled with Metabase, the software version used in this document is Metabase v0.46.5, built on 2023-06-14.

Docker

The only requirement for Metabase is Docker which is available for Windows, Linux and Mac. You can follow the download and install instructions from the website.

Run Metabase

Once Docker is installed, Metabase can be started by running:

docker run -d -p 3000:3000 --name metabase metabase/metabase

This is the default way of installing the open source edition as described on the Metabase website (https://www.metabase.com/start/oss). After running the previous command, point the browser to http://localhost:3000, set up a username and password and connect to the postgresql database directly.

When you open the webpage, you will be greeted by a welcome screen. This screen will show a Let's get started button at first use.

After pressing the button, you will be prompted to select your language. We select English.

Next, you need to set up a user account. This account will be local to your installation, so don't worry about what you enter here, as long as you remember the email address you use and the password.

Then you need the to select a Database type, in our case this will be PostgreSQL, the option with the blue elephant.

Because Metabase runs isolated, you cannot use localhost as a host. Make sure you enter the right credentials and select the ip-address to your computer.

When finished, press the Connect database button.

If all went well, you should see Step 4 showing a Finish button. We decided to disallow Metabase to anonymously collect usage events.

You can then decide to sign up for a the Metabase newsletter, but we skip it and press the Take me to Metabase button.

If all went well, you should have a Metabase dashboard like the next picture. You can press the buttons in the dashboard to get some general insights into our database.

The Metabase dashboard will be used throughout the visualisation chapter. More information on Metabase can be found in the Documentation section of the Metabase website

Add the map of Europe

If you want to visualise data on a map, you can use this Map of Europe with which it is possible to attach the country field from the tables in the public namespace as shown in the example below.

Substance

In the visualisation chapter for substances we will visualise the queries we ran in the analyses chapter.

Substance top-10

In Metabase, select the EMA database from the Home page clicking the link.

You will be taken to a page that shows all the tables in the database.

Click the Substance button. You should now see the Substance table.

We press the + New button in the top right of the screen and select SQL query.

And end up with a blank sheet

In the input area of the page, we enter the query that we find at analyses - substances - substance top-10

And press the blue play button in the lower right corner.

A screen appears showing that the data is being loaded (doing science).

When the query finishes running, your screen should show the query in the top section and a table in the bottom section like in the next screenshot.

You can now select the Visualization button that opens a side panel.

Select the Row visualisation to get the picture shown below.

If you hover your mouse over the now selected Row button, you will see a cog symbol appear. Click it. A new side panel opens with Row options

Remove the substance_id from the X-axis

You will see the visualisation change, now showing the substance top-10 in a horizontal row chart.

Substance reports per month

In Metabase, select the EMA database from the Home page clicking the link.

You will be taken to a page that shows all the tables in the database.

Click the Substance button. You should now see the Substance table.

Press the Report Date button above the column showing the dates and select Distribution

Metabase should render a line graph.

Substance reaction per month

In Metabase, select the EMA database from the Home page clicking the link.

You will be taken to a page that shows all the tables in the database.

Click the Substance Reaction button. You should now see the substance_reaction table.

Press the Summarize button in the top right corner. In the side panel that opens, Make sure that it the Count button is green. Hover over the report data and change the value by day to by month. Now hover over Outcome to see a + appear to add grouping. Click it.

We need a filter, to hide the line with the empty outcome. Press Filter in the top right. A screen will pop up where we will select "Outcome Not empty" as shown in the next screenshot. When done, press Apply Filters

After Metabase retrieves the results from the database, the display should look like the next screenshot.

Metabase offers a variety of options to tweak the visualisation. We replaced COUNT with Distinct values of Local Number which takes a while to render but makes sure only one instance of each report version is used in the visualisation. Since a lot of reports are connected to multiple substances, the Y-axis maximum value changes from 280000 to 90000. Make sure you use distinct reports in your visualisations too!

We notice a sharp increase sometime after the start 2020. When we zoom in to January 2020 to July 2023, we get a more detailed view of what is happening.

In the zoomed in picture, we can see that from January 2021, the total number of reports explodes and then slowly decreases, but the number of reports that result in Fatal and Recovered/Resolved With Sequelae differ in the hundreds which make their line appear flat. Let's isolate these two lines by turning the others off. This can be done from the left side panel that opens when we press the Visualization button and select the Cog symbol in the upper right corner of on the Line Graph button that appears when we hover that button.

The number of reports with Fatal outcome fluctuate, but stay above 2417 for this period in time. If we take the whole period of the database, we see two peak periods in reports with Fatal outcome; February to June 2008 and September 2012 to January 2012. We can also see that after February 2011 the reports with Fatal outcome no longer drop below 2417 until the end of the total dataset.

We strongly encourage everyone to create their own visualisation and describe their choices so results can be reproduced and discussed.

Product

In the visualisation chapter for products we will visualise the queries we ran in the analyses chapter for products.

Product reaction per year

In Metabase, select the EMA database from the Home page clicking the link.

You will be taken to a page that shows all the tables in the database.

We created the following visualisation where we:

Summarized by Distinct values of Local Number
Grouped by Outcome
Aggregated the Report date by year

The screenshot shows the result, we challenge you to create a similar presentation.

We changed the sequence of the results and changed the colors for the bars in the left panel.

Did you manage to get the same presentation? Let's look at it in detail and draw example conclusions.

A couple of things we notice:

The number of reports on product with a Fatal outcome seems to be stable or even decline a little.
The number of reports on product that have an unspecified or unset outcome has been rising and kept rising until 2020 to drop a little from there.
The number of reports on product is declining since 2020
The number of reports on product with outcome Recovering/Resolving and Recovered/Resolved has been rising, but is declining since 2020.

Can you think of more advanced uses? We welcome discussion and suggestions, also about misinterpretation. Let us know!

Glossary

More terms are explained in the adrreports.eu Glossary

Adverse Drug Reaction Report (Adrr)

The adrreports.eu portal provides public access to reports of suspected side effects submitted to the EudraVigilance system by national medicines regulatory authorities and pharmaceutical companies that hold marketing authorisations for medicines in the European Economic Area (EEA).

see: Adrreports.eu

Concomitant

Synonym for "accompanying". A descendant of Latin concomitari ("to accompany") and ultimately of "comes," the Latin word for companion.

source: Merriam Webster Dictionary

Drug

Synonym for Medicine as stated on Adrreports.eu: "Centrally authorised medicine or non-centrally authorised medicine"

"For centrally authorised medicines, access to reports is granted both by the name of the medicine or the name of the active substance. For non-centrally authorised medicines, access is granted based on the name of the active substance only."

As an example: the most reported substance we find in the database is Tozinameran, Comirnaty is the product it is used in, BioNTech/Pfizer-vaccin is the drug that would be administered.

_see: Adrreports.eu (EN)

Drug List

Information that shows characterisation of ‘Drug Role’, defined as suspect, interacting, concomitant or drug not administered. Based on this data element, 2 different ‘Drug’ (medicines) lists will be created, for suspect and interacting drugs and for concomitant or drug not administered. Reported medicinal product, displayed as recoded against the Extended EudraVigilance Medicinal Product Dictionary for centrally authorised products (for non-centrally authorised products, only the recoded substance will be displayed where reported). Substance / Specified Substance Name, displayed as recoded against the Extended EudraVigilance Medicinal Product Dictionary (if not, it will be displayed as reported). Indication of the medicinal product described as MedDRA Preferred Term. ‘Duration of Drug Administration’, as reported or based on ‘Drug Administration Start Date’ and ‘End Date’, Dose of the medicine and Route of administration of the medicine.

source: Web Report User Guide

EMA

Abbreviation for "European Medicine Agency".

"The European Medicines Agency (EMA) is a decentralised agency of the European Union (EU). It began operating in 1995. The Agency is responsible for the scientific evaluation, supervision and safety monitoring of medicines in the EU."

source: EMA website

EudraVigilance

"The system for managing and analysing information on suspected adverse reactions to medicines which have been authorised or being studied in clinical trials in the European Economic Area (EEA). The European Medicines Agency (EMA) operates the system on behalf of the European Union (EU) medicines regulatory network."

source: EMA Website, EudraVigilance

Line listing

Data downloaded from the EudraVigilance webpage.

"The data and information provided at substance level is in the form of an electronic Reaction Monitoring Report (eRMR) containing aggregated data and a line listing with details of the individual cases. Users are also able to retrieve Individual Case Safety Report (ICSRs) forms accessible through the line listing."

see: EudraVigilance User Manual Marketing Authorisation Holders

Local number

The local number is the unique identifier also known as "Case Identifier format" as stated on page 89, #12 of the EU Individual Case Safety Report (ICSR)¹ Implementation Guide. It consists of

"Country code" followed by a dash, "organisation ID or name" followed by a dash and "local system number"

f.i.

EU-EC-2950025

In the public EMA database, Country code has been replaced by EU and organisation ID by EC (abbr. for European Commission) for all the reports.

mdbook

A tool used for generating documentation from markdown files. mdbook documentation website

Postgresql

The database system we use. PostgreSQL website

Report

Used throughout this documentation as synonym for ICSR or Individual Case Safety Report.

"Used for reporting to the EudraVigilance database suspected adverse reactions to a medicinal product that occur in a single patient at a specific point in time"

see: EU Individual Case Safety Report (ICSR)¹ Implementation Guide

SQL

"A domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables."

source: Wikipedia

Drug List

source: Web Report User Guide

Database schema

Country data

Table "public.product_cases_per_country"

Column	Type	Collation	Nullable	Default
high level code	bigint
country	text
value	bigint

Table "public.substance_cases_per_country"

Column	Type	Collation	Nullable	Default
high level code	bigint
country	text
value	bigint

Line Listings

All line listings contain field names on the first row. These field names are mapped to convenient database table column names for better readability. This table contains the mapping of the fields to columns plus a description.

CSV field name	Database column name	Description
EU Local Number	`local_number`	The identifier of the individual case safety repport (ICSR)
Report Type	`report_type`	the value of this field is always "Spontanous"
EV Gateway Receipt Date	`report_date`	Date/time the report was issued
Primary Source Qualification	`qualification`	Can be "Healthcare Professional", "Non Healthcare Professional" or "Not Specified"
Primary Source Country for Regulatory Purposes	`country`	Can be "European Economic Area" "Non European Economic Area" or "Not Specified"
Literature Reference	table `import.___literature`
Patient Age Group	`age_group_lower_months` `age_group_upper_months`	split in lower and upper values in months as this is the lowest resolution present f.i. for children aged 0 to 2 months. Calculating the lower and upper age value allows measuring withing groups as the age groups are sometimes shifted.
Patient Age Group (as per reporter)	N/A	All rows in the CSV files contained the value "Not specified" so we skipped it
Parent Child Report	`is_child_report`	Can be `true` or `false`
Patient Sex	`sex`	Can be "Female" or "Male"
Reaction List PT (Duration – Outcome - Seriousness Criteria)	table `import.___reaction`
Suspect/interacting Drug List (Drug Char - Indication PT - Action taken - [Duration - Dose - Route])	table `import.___subject_drug_list`
Concomitant/Not Administered Drug List (Drug Char - Indication PT - Action taken - [Duration - Dose - Route])	table `import.___concomitant_drug_list`
ICSR Form	N/A	Skipped because it can be reconstructed from the `local_number`

Products

Table "import.products"

This table contains the product_id used in the other product related tables and queries and a human readable name.

Column	Type	Collation	Nullable	Default
product_id	bigint
name	text

Table "import.product"

Column	Type	Collation	Nullable	Default
product_id	integer
local_number	character varying
report_type	character varying
report_date	date
qualification	character varying
country	character varying
age_group_lower_months	smallint
age_group_upper_months	smallint
age_group_reporter	character varying
is_child_report	boolean
sex	character varying
serious	boolean

Table "import.product_reaction"

Column	Type	Collation	Nullable	Default
product_id	integer
local_number	character varying
report_date	date
reaction	character varying
duration	character varying
outcome	character varying
seriousness_criteria	character varying

Table "import.product_literature"

Column	Type	Collation	Nullable	Default
product_id	integer
local_number	character varying
report_date	date
literature	character varying

Table "import.product_concomitant_drug_list"

Column	Type	Collation	Nullable	Default
product_id	integer
local_number	character varying
report_date	date
drugs	character varying
characteristic	character varying
indication	character varying
action	character varying
duration	character varying
dose	character varying
route	character varying

Table "import.product_subject_drug_list"

Column	Type	Collation	Nullable	Default
product_id	integer
local_number	character varying
report_date	date
drugs	character varying
characteristic	character varying
indication	character varying
action	character varying
duration	character varying
dose	character varying
route	character varying

Substances

Table "import.substances"

This table contains the substance_id used in the other substance related tables and queries and a human readable name.

Column	Type	Collation	Nullable	Default
substance_id	bigint
name	text

Table "import.substance"

Column	Type	Collation	Nullable	Default
substance_id	integer
local_number	character varying
report_type	character varying
report_date	date
qualification	character varying
country	character varying
age_group_lower_months	smallint
age_group_upper_months	smallint
age_group_reporter	character varying
is_child_report	boolean
sex	character varying
serious	boolean

Table "import.substance_reaction"

Column	Type	Collation	Nullable	Default
substance_id	integer
local_number	character varying
report_date	date
reaction	character varying
duration	character varying
outcome	character varying
seriousness_criteria	character varying

Table "import.substance_literature"

Column	Type	Collation	Nullable	Default
substance_id	integer
local_number	character varying
report_date	date
literature	character varying

Table "import.substance_concomitant_drug_list"

Column	Type	Collation	Nullable	Default
substance_id	integer
local_number	character varying
report_date	date
drugs	character varying
characteristic	character varying
indication	character varying
action	character varying
duration	character varying
dose	character varying
route	character varying

Table "import.substance_subject_drug_list"

Column	Type	Collation	Nullable	Default
substance_id	integer
local_number	character varying
report_date	date
drugs	character varying
characteristic	character varying
indication	character varying
action	character varying
duration	character varying
dose	character varying
route	character varying

Contact

We are working on a community platform. In the meanwhile, you can reach us by email at community@open-ema.eu or open an issue in our repository. To create issues on gitlab, you need a gitlab account. Please read the gitlab privacy statement for details about your account.

OpenEMA takes the PUBLICLY available data from EudraVigilance and makes it more user friendly. We are a PRIVATE entity, not affiliated with EMA.

About

The internet domain OpenEMA.org hosts a community supported open source project and is not affiliated in any way with any European Union Agency. The website documents how publicly available data can be opened and analysed, such as it is presented to Healthcare Professionals and the Public on the EMA adrreports.eu portal.

In the 'reports' section of the 'Analyses' chapter of docs.openema.org it is indicated that approx. 9.6 million unique reports are presented by EMA to Healthcare Professionals and the Public. In the 2022 Annual Report on EudraVigilance for the European Parliament, the Council and the Commission, however it is said there are some 15 million cases.

No conclusions can be drawn based on the subset of information that is the 9.6 million unique reports.

You can download CSV files and PostgreSQL database dumps from:

You may visit the European Medicines Agency website here

You may query the EudraVigilance system here

OpenEMA documentation