Re: Using an array to transform data from long to wide

zaystudent · Posted 07-06-2022 03:38 PM

Hi I'm trying to use an array to transform my dataset and character variables. Currently my data looks has the following column titles:

Drug Name. Reason. Onset. End. Status.

This data has approximately 13,000 rows and I want it to look like this

Drug Name. Reason1. Onset1. End1. Status1. Reason2. Onset2. End2. Status3. Reason3. Onset3. End3. Status3.

This way the data will be long and each drug may have a different row length. Currently, my code is as follows:

data want;
set want1;
by Name;
retain reas1 reas2 reas3 reas4 reas5 reas6 reas7 reas8 reas9 reas10
on1 on2 on3 on4 on5 on6 on7 on8 on9 on10
endd1 endd2 endd3 endd4 endd5 endd6 endd7 endd8 endd9 endd10
stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 stat10;
array reas(10) reas1 reas2 reas3 reas4 reas5 reas6 reas7 reas8 reas9 reas10;
array on(10) on1 on2 on3 on4 on5 on6 on7 on8 on9 on10;
array endd(10) endd1 endd2 endd3 endd4 endd5 endd6 endd7 endd8 endd9 endd10;
array stat(10) stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 stat100;
if first.name then count=1;
else count+1;
do i=1 to 10;
if first.name then do;
reas(i)='.';
on(i)='.';
endd(i)='.';
stat(i)='.';
end;
end;
reas(count)=Reason;
on(count)=Onset;
endd(count)=End;
stat(count)=Status;
if last.name;
run;

However, this isn't working for me. Any suggestions or feedback would be really appreciated!!

mkeintz · Posted 07-06-2022 04:27 PM

Let's say there are 60 possible reason values, but no drug has more than 10 reasons listed (i.e. no more than 10 observations per drug). But you presumably want one obs per drug with (up to) 10 reason variables (and 10 status, end, and onset variables).

Then your client wants a frequency of reasons. The transformation you want means that a particular reason may appear in any one of the 10 reason variables. What this proposed transformation does is to create an analysis-phobic data structure.

Why?

Now, perhaps this is not being done for analysis, but rather just to make it easier for a person to look at the data, with one "line" per drug, while preserving the original data set file.

If so, then I would suggest making a data set VIEW from the original data set FILE, structured per your request. Every time you access the view, it would retrieve the original data set file and present it in the layout you propose. The advantage is (1) you still have the original data for analysis, and (2) any change in values or addition/removal of observations in the original data set would be reflected every time you access the derived view without re-coding.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Kurt_Bremser · Posted 07-06-2022 04:29 PM

"Isn't working" tells us next to nothing. Post the log, and if the step runs without ERRORs or WARNINGs tell us where the result deviates from your expectations.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kevin_Leonte · Posted 07-06-2022 04:30 PM

This is very close. You don't need to the do loop. The do loop is telling SAS to create the 10 variables for EACH record, which I don't believe you want.

I made up the following data:

Your modified code worked fine for me on that data:

data want (drop=reason onset end status count);
set want1;
by Name;
retain reas1 reas2 reas3 reas4 reas5 reas6 reas7 reas8 reas9 reas10
on1 on2 on3 on4 on5 on6 on7 on8 on9 on10
endd1 endd2 endd3 endd4 endd5 endd6 endd7 endd8 endd9 endd10
stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 stat10;
array reas(10) reas1 reas2 reas3 reas4 reas5 reas6 reas7 reas8 reas9 reas10;
array on(10) on1 on2 on3 on4 on5 on6 on7 on8 on9 on10;
array endd(10) endd1 endd2 endd3 endd4 endd5 endd6 endd7 endd8 endd9 endd10;
array stat(10) stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 stat10;
if first.name then count=1;
else count+1;
reas(count)=Reason;
on(count)=Onset;
endd(count)=End;
stat(count)=Status;

if last.name;
run;

Also, if you want the order to be reason1 onset1 end1 status1 count1, etc... then you have to specify that order in the retain statement.

Reeza · Posted 07-06-2022 04:41 PM

You do need to reset the values otherwise RETAIN will hold the previous values in the array and you're carrying over wrong information from row to row.

Tom · Posted 07-06-2022 11:40 PM

Do you care if the new variables are grouped by the original NAME instead of by the original ROW?

If not then just use PROC SUMMARY to make the dataset. You will need to set an upper bound on the number of rows per NAME you want to transpose. In your example you used 10 as this limit. (If you want you could find the number in advance and put it into a macro variable.)

proc summary data=HAVE ;
  by NAME ;
  output out=want idgroup (out[10] (Reason Onset End Status)=);
run;

Since you didn't provide any data let's make our own example to demonstrate.

data have;
  input id x y ;
cards;
1 1 2
1 2 3
1 4 5
2 6 7
2 8 9
;

proc summary data=have;
  by id;
  output out=want idgroup (out[4] (x y)=);
run;

Results:

Using an array to transform data from long to wide