09 January 2018

JSON_TABLE and the Top 2000

Oracle database 12c provides native JSON parsing. It is relatively easy to go from data in a JSON format to a relational representation.
In this example I will use data from the Top 2000. Every year in The Netherlands a playlist is produced by popular vote and fully broadcast between Christmas and the last day of the year, with the number one being played just before the New Year. You can see the complete list of 2017 on the official NPO website.
The Top 2000 list that I'll use for this example is the one from 2017.

The data from the JSON file looks like this:

Only the first part of the JSON file is shown, with the first two songs.
At the highest level there are three name-value pairs ("success", "message" and "messages") and an array named "data".
The "data" array contains another array with JSON objects containing information about the songs.
Each of these JSON objects contain name-value pairs, with very short none descriptive names, like "s" and "a". You might guess what these names would mean.
Even though the "data" attribute is a nested array, at the top level it is only one level deep.
The complete JSON-file can be downloaded by following this link.

At first I want to get the JSON file into the database, before I extract the values that I need.
First create the table and a check constraint to make sure that is JSON.

create table temp2000 
(complete_list clob);

alter table temp2000
add constraint list_is_json check (complete_list is json)
/
This table will hold the complete JSON file containing the Top2000. The check constraint on the column will verify that only correctly formatted JSON is allowed.
Now add the JSON-file to the table, the easiest way to do this is by adding a row using SQL Developer and copy-paste the complete JSON.

What I want to extract from the complete JSON file is the information about the artist, songtitle, release year, current position in the list, as well as the position in the list from last year.
My guess would be that the "a", "s", "yr", "pos", "prv" are the attributes that I need.
With a CTAS (Create Table as SELECT) and the JSON_TABLE operator I can transform the JSON to relational representation.

create table top2000
as
select songtitle
      ,artist
      ,release_year
      ,curr_position
      ,prev_position
  from temp2000 t
      ,json_table (t.complete_list format json, '$.data[0][*]'
         columns (
           songtitle     varchar2(150) path '$.s'
          ,artist        varchar2(150) path '$.a'
          ,release_year  number path '$.yr'
          ,curr_position number path '$.pos'
          ,prev_position number path '$.prv'
         )
      )
/
Because the song information is contained in the "data" array, and only in the nested array, I need to address that array as follows:
'$.data[0][*]'
Zero (in square brackets) representing the first array in the "data" attribute, and a wildcard (the asterisk) to address the other objects in the nested array.

To help with the discovery of the structure (and the paths to the values) of the JSON, Oracle Database 12c Release 2 introduced the JSON_DATAGUIDE function. Unfortunatelly I don't have Release 2 at my disposal right now, so I leave that for a later time.

Links