How can i transpose the following long data to the wide format?
data long;
input var $ value source;
cards;
aid 45 1
bid 46 1
aid 47 1
bid 48 1
aid 45 2
bid 46 2
aid 47 2
bid 48 2
; run;
data wide;
input aid bid source;
cards;
45 46 1
47 48 1
45 46 2
47 48 2
; run;
Works fine for me:
libname json json "C:\downloads\MyDrug_76de7937-695d-4981-84df-9cf34a029451.json" ;
data test;
row+1;
set json.alldata;
run;
proc print data=test(obs=10);
run;
OBS row P P1 V Value 1 1 1 DrugId 1 e88c72aa-f2b1-4e87-b17a-cdc069a4825c 2 2 1 BrandName 1 Pembrolizumab infusion 3 3 1 GenericName 1 null 4 4 1 Strength 1 null 5 5 1 Form 1 null 6 6 1 Route 1 null 7 7 1 Schedule 1 null 8 8 1 DoseUnit 1 null 9 9 1 IsCytotoxic 1 false 10 10 1 IsCytoStatic 1 false
But I don't think that is what you want to do if you are using the ALLDATA view since it already has multiple observations per JSON object.
If the JSON is well formed then try using something like this instead.
data test;
retain row;
set json.alldata;
if _n_=1 then key=p1;
retain key;
row + key=p1;
run;
proc print data=test(obs=5);
where p1=key;
run;
OBS row P P1 V Value key 1 1 1 DrugId 1 e88c72aa-f2b1-4e87-b17a-cdc069a4825c DrugId 28 2 1 DrugId 1 9e19974c-a373-4488-ade0-ea8e8cb171cf DrugId 55 3 1 DrugId 1 9e865e2f-ee0a-4616-bdfc-796cb0a835f4 DrugId 82 4 1 DrugId 1 4affd24c-a1c7-4c69-893f-ba1774f05bb3 DrugId 109 5 1 DrugId 1 9c02b467-5165-4fdf-baef-11309c5aa92b DrugId
Is it always a sequence of "aid" followed by "bid", containing identical values for source?
The aid and bid may change sequence or we can have more members to add, such as cid. Yes- all the id variables will have identical source.
So you must first define a clear rule for detecting the moment where you write a wide observation and reset the values of RETAINed variables to missing. Without an identifier for corresponding input observations, you can't use PROC TRANSPOSE, you'll need some tricky DATA step programming.
The data came from appending multiple JSON files. For each source there are same number of aid and bid.
But the issue here is that the combinations of source and var are not unique, so you need another rule which identifies the "groups" that should constitute a single wide observation.
Yes- that's the challenge. The id variables are not unique within the group of by variables. Can something be done using array?
Defining the rule must be done by you before we can start to code. You must define the logic first.
We need to have a row identifier as a combination of var set and source as shown below.
data long;
input var $ value source row;
cards;
aid 45 1 1
bid 46 1 1
aid 47 1 2
bid 48 1 2
aid 45 2 3
bid 46 2 3
aid 47 2 4
bid 48 2 4
; run;
Do you already have that row identifier?
No.
Then what would be the rule (in plain words) for assigning it?
As one can see in my last dataset, unique number should be assigned for each new repeat of the var and source.
This needs some tricky code:
data long;
input var $ value source;
cards;
aid 45 1
bid 46 1
aid 47 1
bid 48 1
aid 45 2
bid 46 2
aid 47 2
bid 48 2
;
proc sql noprint;
select distinct var into :columns separated by " " from long;
quit;
%put &columns;
%macro wide;
data wide;
set long;
by source;
array c {*} &columns;
retain
%do i = 1 %to %sysfunc(countw(&columns));
%scan(&columns,&i)
%end;
;
if first.source then call missing(of c{*});
select (var);
%do i = 1 %to %sysfunc(countw(&columns));
when ("%scan(&columns,&i)") do;
if %scan(&columns,&i) ne .
then do;
output;
call missing(of c{*});
end;
%scan(&columns,&i) = value;
end;
%end;
end;
if last.source and (
%do i = 1 %to %sysfunc(countw(&columns));
%if &i ne 1 %then or;
%scan(&columns,&i) ne .
%end;
)
then output;
drop var value;
run;
%mend;
%wide
Thanks for your time to generate the code. It works for numeric data with short length. But my actual data is a mixture of character and long integer and it doesn't work well.
My data:
aid | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 1 | 1 |
bid | 1494141000168105 | 1 | 1 |
cid | 1494131000168101 | 1 | 1 |
aid | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 1 | 2 |
bid | 1494131000168659 | 1 | 2 |
cid | 1494131000168333 | 1 | 2 |
aid | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 2 | 3 |
bid | 2154131000168101 | 2 | 3 |
cid | 88794131000168101 | 2 | 3 |
aid | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 2 | 4 |
bid | 1598731000168101 | 2 | 4 |
cid | 1489751000168101 | 2 | 4 |
when i use your code the resulting wide table looks like:
1 | 2 | . | 1.4941E15 | 1.4941E15 |
1 | 2 | . | 1.4941E15 | 1.4941E15 |
2 | 4 | . | 2.1541E15 | 8.8794E16 |
2 | 4 | . | 1.5987E15 | 1.4898E15 |
If i use proc transpose with source and row variables in the by statement it works fine. Thus I just need help to generate the row variable.
1 | 1 | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 1494141000168105 | 1494131000168101 |
1 | 2 | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 1494131000168659 | 1494131000168333 |
2 | 1 | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 2154131000168101 | 88794131000168101 |
2 | 2 | 11a3fc66-1f44-4421-bad5-7b0f053a8a19 | 1598731000168101 | 1489751000168101 |
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.