Solved: Re: Transposing to wide format

bayzid · Posted 04-25-2023 08:22 PM

How can i transpose the following long data to the wide format?

data long;
input var $ value source;
cards;
aid 45 1
bid 46 1
aid 47 1
bid 48 1
aid 45 2
bid 46 2
aid 47 2
bid 48 2
; run;

data wide;
input aid bid source;
cards;
45 46 1
47 48 1
45 46 2
47 48 2
; run;

Tom · Posted 04-26-2023 08:33 PM

Works fine for me:

libname json json "C:\downloads\MyDrug_76de7937-695d-4981-84df-9cf34a029451.json" ;
data test;
 row+1;
 set json.alldata;
run;

proc print data=test(obs=10);
run;

OBS    row    P    P1              V    Value

  1      1    1    DrugId          1    e88c72aa-f2b1-4e87-b17a-cdc069a4825c
  2      2    1    BrandName       1    Pembrolizumab infusion
  3      3    1    GenericName     1    null
  4      4    1    Strength        1    null
  5      5    1    Form            1    null
  6      6    1    Route           1    null
  7      7    1    Schedule        1    null
  8      8    1    DoseUnit        1    null
  9      9    1    IsCytotoxic     1    false
 10     10    1    IsCytoStatic    1    false

But I don't think that is what you want to do if you are using the ALLDATA view since it already has multiple observations per JSON object.

If the JSON is well formed then try using something like this instead.

data test;
  retain row;
  set json.alldata;
  if _n_=1 then key=p1;
  retain key;
  row + key=p1;
run;

proc print data=test(obs=5);
 where p1=key;
run;

OBS    row    P      P1      V                   Value                     key

  1     1     1    DrugId    1    e88c72aa-f2b1-4e87-b17a-cdc069a4825c    DrugId
 28     2     1    DrugId    1    9e19974c-a373-4488-ade0-ea8e8cb171cf    DrugId
 55     3     1    DrugId    1    9e865e2f-ee0a-4616-bdfc-796cb0a835f4    DrugId
 82     4     1    DrugId    1    4affd24c-a1c7-4c69-893f-ba1774f05bb3    DrugId
109     5     1    DrugId    1    9c02b467-5165-4fdf-baef-11309c5aa92b    DrugId

View solution in original post

Kurt_Bremser · Posted 04-26-2023 01:34 AM

Is it always a sequence of "aid" followed by "bid", containing identical values for source?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bayzid · Posted 04-26-2023 01:38 AM

The aid and bid may change sequence or we can have more members to add, such as cid. Yes- all the id variables will have identical source.

Kurt_Bremser · Posted 04-26-2023 01:55 AM

So you must first define a clear rule for detecting the moment where you write a wide observation and reset the values of RETAINed variables to missing. Without an identifier for corresponding input observations, you can't use PROC TRANSPOSE, you'll need some tricky DATA step programming.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bayzid · Posted 04-26-2023 02:07 AM

The data came from appending multiple JSON files. For each source there are same number of aid and bid.

Kurt_Bremser · Posted 04-26-2023 02:12 AM

But the issue here is that the combinations of source and var are not unique, so you need another rule which identifies the "groups" that should constitute a single wide observation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

BayzidurRahman · Posted 04-26-2023 02:18 AM

Yes- that's the challenge. The id variables are not unique within the group of by variables. Can something be done using array?

Kurt_Bremser · Posted 04-26-2023 02:21 AM

Defining the rule must be done by you before we can start to code. You must define the logic first.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bayzid · Posted 04-26-2023 03:11 AM

We need to have a row identifier as a combination of var set and source as shown below.

data long;
input var $ value source row;
cards;
aid 45 1 1
bid 46 1 1
aid 47 1 2
bid 48 1 2
aid 45 2 3
bid 46 2 3
aid 47 2 4
bid 48 2 4
; run;

Kurt_Bremser · Posted 04-26-2023 03:37 AM

Do you already have that row identifier?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bayzid · Posted 04-26-2023 04:23 AM

No.

Kurt_Bremser · Posted 04-26-2023 04:51 AM

Then what would be the rule (in plain words) for assigning it?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bayzid · Posted 04-26-2023 05:33 AM

As one can see in my last dataset, unique number should be assigned for each new repeat of the var and source.

Kurt_Bremser · Posted 04-26-2023 08:34 AM

This needs some tricky code:

data long;
input var $ value source;
cards;
aid 45 1
bid 46 1
aid 47 1
bid 48 1
aid 45 2
bid 46 2
aid 47 2
bid 48 2
;

proc sql noprint;
select distinct var into :columns separated by " " from long;
quit;
%put &columns;
%macro wide;

data wide;
set long;
by source;
array c {*} &columns;
retain
%do i = 1 %to %sysfunc(countw(&columns));
  %scan(&columns,&i)
%end;
;
if first.source then call missing(of c{*});
select (var);
%do i = 1 %to %sysfunc(countw(&columns));
  when ("%scan(&columns,&i)") do;
    if %scan(&columns,&i) ne .
    then do;
      output;
      call missing(of c{*});
    end;
    %scan(&columns,&i) = value;
  end;
%end;
end;
if last.source and (
%do i = 1 %to %sysfunc(countw(&columns));
  %if &i ne 1 %then or;
  %scan(&columns,&i) ne .
%end;
)
then output;
drop var value;
run;

%mend;

%wide

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

bayzid · Posted 04-26-2023 09:33 AM

Thanks for your time to generate the code. It works for numeric data with short length. But my actual data is a mixture of character and long integer and it doesn't work well.

My data:

Obs var value source row123456789101112

aid	11a3fc66-1f44-4421-bad5-7b0f053a8a19	1	1
bid	1494141000168105	1	1
cid	1494131000168101	1	1
aid	11a3fc66-1f44-4421-bad5-7b0f053a8a19	1	2
bid	1494131000168659	1	2
cid	1494131000168333	1	2
aid	11a3fc66-1f44-4421-bad5-7b0f053a8a19	2	3
bid	2154131000168101	2	3
cid	88794131000168101	2	3
aid	11a3fc66-1f44-4421-bad5-7b0f053a8a19	2	4
bid	1598731000168101	2	4
cid	1489751000168101	2	4

when i use your code the resulting wide table looks like:

Obs source row aid bid cid1234

1	2	.	1.4941E15	1.4941E15
1	2	.	1.4941E15	1.4941E15
2	4	.	2.1541E15	8.8794E16
2	4	.	1.5987E15	1.4898E15

If i use proc transpose with source and row variables in the by statement it works fine. Thus I just need help to generate the row variable.

Obs source row aid bid cid1234

1	1	11a3fc66-1f44-4421-bad5-7b0f053a8a19	1494141000168105	1494131000168101
1	2	11a3fc66-1f44-4421-bad5-7b0f053a8a19	1494131000168659	1494131000168333
2	1	11a3fc66-1f44-4421-bad5-7b0f053a8a19	2154131000168101	88794131000168101
2	2	11a3fc66-1f44-4421-bad5-7b0f053a8a19	1598731000168101	1489751000168101

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away