I am transposing a huge dataset of patients' clinical data. Each row contains the information of an operation number, code and date in a single visit of one patient. I have a functioning code that transposes the operation number, code and date into variables.
The whole thing looks like this:
DATA mydata;
FORMAT OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
...... etc.
;
run;
PROC SORT DATA=mydata;
BY ID visit OP;
RUN;
DATA transC;
DO UNTIL (LAST.ID);
Retain ID V1OP1C V1OP1D V1OP2C etc ..;
FORMAT V1OP1D DDMMYYP10. V1OP2D DDMMYYP10. etc... ;
SET mydata;
BY ID visit;
IF visit="1" AND OP="1" THEN DO; V1OP1C=OPcode; V1OP1D=OP_date;END;
IF visit="1" AND OP="2" THEN DO; V1OP2C=OPcode; V1OP2D=OP_date;END;
IF visit="2" etc ...
DROP visit OP OPcode OP_date;
RUN;
I would like to generate the variable names automatically in the IF statement so I wouldn't have to write myself all the combinations of all Visit and OP and also automatically generate the names of the new varibales.
It should be something like this:
The FORMAT statement:
FORMAT VX_OPYdate DDMMYYP10. etc... ;
The IF statement:
IF visit=X AND OP=Y THEN DO; VX_OPYcode=OPcode; VX_OPYdate=OP_datum; END;
I have tried it this with arrays, but it didn't work. Also my experience is limited, so any suggestions to other parts of the code are welcome 🙂
Thanks in forward
Ubai.
For writing repeating code, the macro language provides the necessary toolset:
data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;
proc sort data=mydata;
by ID visit OP;
run;
%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
%do j = 1 %to &max_o;
V&i.OP&j.C $8.
V&i.OP&j.D DDMMYYP10.
%end;
%end;
;
do until (last.ID);
set mydata;
by ID;
%do i = 1 %to &max_v;
%do j = 1 %to &max_o;
if visit = &i and OP = "&j"
then do;
V&i.OP&j.C = OPcode;
V&i.OP&j.D = OP_date;
end;
%end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)
For writing repeating code, the macro language provides the necessary toolset:
data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;
proc sort data=mydata;
by ID visit OP;
run;
%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
%do j = 1 %to &max_o;
V&i.OP&j.C $8.
V&i.OP&j.D DDMMYYP10.
%end;
%end;
;
do until (last.ID);
set mydata;
by ID;
%do i = 1 %to &max_v;
%do j = 1 %to &max_o;
if visit = &i and OP = "&j"
then do;
V&i.OP&j.C = OPcode;
V&i.OP&j.D = OP_date;
end;
%end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)
Dear Kurt,
thanks for the code. It helped alot and I did few steps afterwards. However, I was tryign to improve the code and shorten unnecessary parts. You have used the FORMAT statement in the macro. It keeps giving me the following warning:
You will have to post the code for inspection.
It's the code you mentioned in your answer in this thread. In the macro code, third line you used a FORMAT statement. This is the statement I meant.
It's missing a semi-colon.
EDIT: nope, but I suspect that's where the change happened to create the error anyways - you likely missed a semi-colon in the conversion or it needs an extra one for some reason.
Ok, I guess I found it out.
This ID variable in my code is a character variable that includes numbers as well. That's why it does have multiple lengths.
Use two proc transposes and merge the data sets.
http://www.ats.ucla.edu/stat/sas/modules/ltow_transpose.htm
Or arrays:
http://www.ats.ucla.edu/stat/sas/modules/longtowide_data.htm
Also, it's rare that a wide format is more efficient than a long format. It's usually harder to calculate downstream indicators in my experience.
To what end?
To make it data driven you could use PROC TRANSPOSE.
DATA mydata;
input id visit op OPcode $ OP_date:DDMMYY10.;
FORMAT OP_date DDMMYYP10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;;;;
run;
proc summary nway completetypes;
class visit op;
output out=frame(drop=_:);
run;
proc print;
run;
proc print data=mydata;
run;
data frameV / view=frameV;
set frame mydata;
run;
proc transpose data=frameV out=code(where=(not missing(ID))) prefix=V suffix=C delim=OP;
by id;
var opcode;
id visit op;
format visit op z2.;
run;
proc print;
run;
I remember that I had to do something quite similar when preparing data for Enterprise Miner.
For survival analysis too? I would love to learn from the scipt. I would be very grateful if I can have a look at it.
The macro basically looks like that:
%macro cumvars(crit);
retain
count_&crit
sum_&crit
;
if first.person_key
then do;
count_&crit = 0;
sum_&crit = 0;
end;
if critvar = "&crit"
then do;
count_&crit + 1;
sum_&crit + sumvalue;
end;
%mend;
This is of course just a basic template, in reality there were much more variables.
Since I work in an insurance company, the criterion was the insurance class, and we summed values like overall premium, new or changed risks in last year/last two years/last three years, claims (sum and number), and so on,
The data step then looks like
data person_base;
set claims_contracts;
by person_key;
%cumvars(cl1)
%cumvars(cl2)
%cumvars(cl3)
....
if last.person_key then output;
run;
For further automation, the macro calls could be done automatically from a list of classes.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.