Solved: Re: Transposing to wide format - Page 2

Tom · Posted 04-26-2023 11:54 AM

What is the RULE that says when to wrap them?

Do you just want to take pairs?

data long;
  input var $ value source;
cards;
aid 45 1
bid 46 1
aid 47 1
bid 48 1
aid 45 2
bid 46 2
aid 47 2
bid 48 2
; 

data long_pairs;
  set long;
  group + (1=mod(_n_,2));
run;

proc transpose data=long_pairs out=wide(drop=_name_);
  by group source;
  id var;
  var value;
run;

proc print;
run;

OBS    group    source    aid    bid

 1       1         1       45     46
 2       2         1       47     48
 3       3         2       45     46
 4       4         2       47     48

bayzid · Posted 04-26-2023 04:51 PM

The rule is each appearance of all the distinct values in var by source. If we have 3 distinct values in var then the last digit needs to be 3 to convert into long_pair. Thus, we need to find out the number of distinct values in "var" for each dataset.

data long;
input var $ value $36. source row;
cards;
aid 11a3fc66-1f44-4421-bad5-7b0f053a8a19 1 1
bid 1494141000168105                     1 1
cid 1494131000168101                     1 1
aid 11a3fc66-1f44-4421-bad5-7b0f053a8a19 1 2
bid 1494131000168659                     1 2
cid 1494131000168333                     1 2
aid 11a3fc66-1f44-4421-bad5-7b0f053a8a19 2 3
bid 2154131000168101                     2 3
cid 88794131000168101                    2 3
aid 11a3fc66-1f44-4421-bad5-7b0f053a8a19 2 4
bid 1598731000168101                     2 4
cid 1489751000168101                     2 4
; run;

data long_pairs;
  set long;
  group + (1=mod(_n_,3));
run;

proc transpose data=long_pairs out=wide(drop=_name_);
  by group source;
  id var;
  var value;
run;

I am trying to avoid to find out the number of distinct values.

Reeza · Posted 04-26-2023 12:19 PM

Go back and check the JSON files. Maybe you shouldn't be working with the all data files but different files. Or there should be something that identifies each group uniquely. Or if it's from different files you can add in each file name.

bayzid · Posted 04-26-2023 04:27 PM

The file name is being saved in the Master_&group datasets but that is not enough to uniquely identify the rows to reshape into wide. The JSON files do not have any column as row identifier but it appears on the left of each file as shown below (1-6).

We need to find some way to import that information from each file as a new column in the Master_&group dataset.

Reeza · Posted 04-26-2023 04:48 PM

In JSON the brackets denote the series of information and group it, which is why I'm suggesting you look at the json mapped data and see if a different file has the data in the format you need.

Those numbers are just line numbers in the text editor. And if you changed the view you could change the line numbers. It's not something you can rely on.

bayzid · Posted 04-26-2023 04:55 PM

The row numbers that appear in the text editor would be fine for the purpose of transposing the data to wide format.

Tom · Posted 04-26-2023 04:57 PM

If you read the JSON file using the JSON engine then is should have made an ORDINAL variable you can use.

If you read the JSON file with your own data step then just make your own counter variables.

data want;
  infile 'file.json';
  row+1;
  ....
run;

bayzid · Posted 04-26-2023 07:30 PM

I am using the row+; in the following code but it is not generating the row variable as the JSON engine would make the ordinal variable.

filename dirtree url 
    'https://raw.githubusercontent.com/sasutils/macros/master/dirtree.sas';
%include dirtree /source2;
%dirtree(C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\json_test    /* Pipe delimited directory list (default=.) */
        , out=filelist  /* Output dataset name */
        , maxdepth=1 /* Maximum tree depth */);
%macro read_json(myfile=, group = );
    filename mydata &myfile.;
    libname myjson JSON fileref=mydata nrm;;
    proc datasets lib=myjson; 
    quit;
    *how to parse the file has not been specified - looks like you need to specify the lengths;
    data file2save;
        length source $200. p1 $256. Value $256.;
        set myjson.alldata;
		        source=&myfile.; 
           	row+1;  run;
    proc append base=master_&group data=file2save force;  run;
    proc sql;
        drop table file2save;
    quit;
    filename mydata;
    libname myjson;
%mend read_json;
proc sql;
    drop table master;
quit;
data run_list;
    set filelist;
    where type='F';
   *add other logic to add group variable, this is just an example;
    group=substr(filename, 1, find(filename,'_'));
*test this is created correctly;
    str=catt('%read_json(myfile="C:\Users\mq10004085\Macquarie University\MRFF Aged care Data - Documents\json_test\', filename, '", group=', group,  ');');
  *once tests are complete uncomment the execution to test it;  
 call execute(str);
run;

This is how it looks like.

Tom · Posted 04-26-2023 08:33 PM

Works fine for me:

libname json json "C:\downloads\MyDrug_76de7937-695d-4981-84df-9cf34a029451.json" ;
data test;
 row+1;
 set json.alldata;
run;

proc print data=test(obs=10);
run;

OBS    row    P    P1              V    Value

  1      1    1    DrugId          1    e88c72aa-f2b1-4e87-b17a-cdc069a4825c
  2      2    1    BrandName       1    Pembrolizumab infusion
  3      3    1    GenericName     1    null
  4      4    1    Strength        1    null
  5      5    1    Form            1    null
  6      6    1    Route           1    null
  7      7    1    Schedule        1    null
  8      8    1    DoseUnit        1    null
  9      9    1    IsCytotoxic     1    false
 10     10    1    IsCytoStatic    1    false

But I don't think that is what you want to do if you are using the ALLDATA view since it already has multiple observations per JSON object.

If the JSON is well formed then try using something like this instead.

data test;
  retain row;
  set json.alldata;
  if _n_=1 then key=p1;
  retain key;
  row + key=p1;
run;

proc print data=test(obs=5);
 where p1=key;
run;

OBS    row    P      P1      V                   Value                     key

  1     1     1    DrugId    1    e88c72aa-f2b1-4e87-b17a-cdc069a4825c    DrugId
 28     2     1    DrugId    1    9e19974c-a373-4488-ade0-ea8e8cb171cf    DrugId
 55     3     1    DrugId    1    9e865e2f-ee0a-4616-bdfc-796cb0a835f4    DrugId
 82     4     1    DrugId    1    4affd24c-a1c7-4c69-893f-ba1774f05bb3    DrugId
109     5     1    DrugId    1    9c02b467-5165-4fdf-baef-11309c5aa92b    DrugId

BayzidurRahman · Posted 04-27-2023 02:37 AM

Thanks Tom.

That worked as expected.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!