Column

COVID-19 Ad hoc Analysis (6)

July 17 2020
Taiga Nielsen (Taiga Nielsen@NTT DATA)
Jun Shiromizu (Jun Shiromizu@NTT DATA)

This Article, using the COVID-19 data as an example, explains the Data aggregation Methodology for ingesting variety of datasets and Ad hoc Analysis.

(6) Converting Johns Hopkins University Datasets to the HyperCube format

The Last Article explained about the overview and How versatile the HyperCube data model is. This article will focus on How to actually create the HyperCube with the COVID-19 datasets provided from Johns Hopkins University(JHU).

Process1:Data Modeling

There are 2 steps in the HyperCube modeling process

  1. Data Modeling
  2. Data Conversion

First, we will explain about the 1. Data Modeling

Defining the Concept and Dimension

The first step of Data modeling for the HyperCube is to find out What kind of Concepts and Dimensions the Fact data holds, and define them.

The COVID-19 datasets JHU provides are

  • When (Year, Month, Day)
  • Which Country
  • Which State(for only U.S and China)
  • Which Prefecture (for only U.S)
  • The types of situation status; Numbers of Infected/ numbers of deaths/ number of recoveries/ number of active COVID cases
  • Total Numbers of the 4 types of situation status above

The [Concept] mentioned in the HyperCube are the situation status; "Numbers of Infected/ numbers of deaths/ number of recoveries/ number of active COVID cases".

The [Value] is the means the amount for each [Concept]; "Total Numbers of the 4 types of situation status above"

The Dimension is the first 4 points, "When", Where:"Which country, state, prefecture".

On our Data Discovery solution, we define the Concept and Dimensions by downloading the Excel sheet from the Taxonomy Maintenance screen. Then output the definitions from the excel to the screen and import it to the system.

Process2:Data Conversion

Lets Convert the data following the style mentioned above.

The below is the image of the outcome. There will be 4 HyperCube [Facts] created with from a row from the original data(because of the 4 types of Concepts the row holds)

It is possible to convert this data format from the database through Query, but with Data Discovery there is a GUI base conversion function provided in the "Create HyperCube" screen.

Data Analytics using HyperCube

With the JHU datasets now in a HyperCube format, Data Discovery can use the Pivot Analysis on the [HyperCube Analytics] screen on browser.

For example, If on the vertical axis the user sets the [Concept(number of Infected, or Deaths)], [Country], and on the Horizontal axis the [Period], it will show you the pivot table shown below.

and on the vertical axis user sets the [Period] and on the horizontal axis the [Country], and [Concept], the below would be shown. like that, it is free for the users to switch any items on the table providing you with the freedom of analyzing multiple concepts.

It is possible to show a line graph of the infected and death tolls over a period of time.

Summary: Versatile analytics using the HyperCube model

This Article explained how to convert JHU datasets into HyperCube format to provide a web based "versatile analytics".

But some readers must think "Well we understand it can Pivot analyze data on the browser, but what does that have to do with the HyperCube format. Surely we can Pivot datasets without making datasets in to the HyperCube format"

That is correct, the strength of the HyperCube lies in "Analyzing Datasets that are from different origins."

The next article will focus on the effects of connecting different datasets to the JHU COVID-19 datasets.

Contact

NTTDATA Corporation
ABLER Promotion Group,
Financial Global IT Services Division,
First Financial Sector