Unique Reports
We want to know about unique reports. The key identifying value of a report is the local number. We asumed these to be unique but reports can be updated or even deleted over time. We noticed that when updates take place, another report with the same local number can appear in the downloads. this can be across multiple line listings so it is a bit hard to detect when using the EMA portal, but we discovered they are present. Besides that, reports are overlapping between substance line listings and product line listings. Our first course of action is to grab all local number's and isolate them in a table. We use a table to increase performance later on when we want to access unique reports again.
Add a table schema
Because we do not want to polute the public
and import
schema's, we create a new schema to create derived tables.
CREATE SCHEMA analyses;
Number of reports
First, we select all unique reports from the substance table and create a new table with the results.
CREATE TABLE analyses.icsr_temp AS
SELECT
local_number,
max(report_date) AS report_date,
max(country) AS country
FROM import.substance
GROUP BY local_number;
This should render 10187759
results
Then we insert the unique reports from the product table into the table generated in the previous query.
INSERT INTO analyses.icsr_temp
SELECT
local_number,
max(report_date) AS report_date,
max(country) AS country
FROM import.product
GROUP BY local_number;
Which should add another 3106773
into the icsr
table.
Then we copy the reports into a table that merges the unique reports from the substance table and the product table so it cannot contain double records.
CREATE TABLE analyses.icsr AS
SELECT
local_number,
max(report_date) AS report_date
FROM analyses.icsr_temp
GROUP BY local_number;
This should render 10210452
results.
We can now drop the temporary table.
DROP TABLE analyses.icsr_temp;
Let's see how much unique reports remain.
The final result is that we have 10210452
unique reports in the analyses.icsr
table.
EMA reports 25.3 million Case Safety Reports (that represent 15 million cases) in the following document.
21 March 2023, EMA/900566/2022, H-Division, 2022 Annual Report on EudraVigilance for the European Parliament, the Council and the Commission, Reporting period: 1 January to 31 December 2022.
What we found is that the resulting table contains 10210452
rows, approx. 10 million unique reports.
Unique reports per year
We can also count the number of unique reports per year running the following query.
SELECT
extract(year from date_trunc('year', report_date)) AS "year",
count(*)
FROM analyses.icsr
GROUP BY date_trunc('year', report_date);
This should render:
year | count |
---|---|
2002 | 35 |
2003 | 11053 |
2004 | 19512 |
2005 | 77431 |
2006 | 124033 |
2007 | 157358 |
2008 | 465759 |
2009 | 331834 |
2010 | 244258 |
2011 | 289149 |
2012 | 415608 |
2013 | 479409 |
2014 | 485101 |
2015 | 479275 |
2016 | 452669 |
2017 | 592893 |
2018 | 732772 |
2019 | 761480 |
2020 | 675657 |
2021 | 1475566 |
2022 | 1518932 |
2023 | 420668 |
(22 rows)
From this table we can see that the database starts to really receive reports in 2003, reaching a tenfold of reports from 2006, seeing a steep rise in 2008 and eventually reaching 100 times the number of reports from the early 2000's in 2021 declining again in 2022 and given our reports were downloaded in July, our careful estimate is that this decline will probably continue in 2023.