BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ubai
Quartz | Level 8

I am transposing a huge dataset of patients' clinical data. Each row contains the information of an operation number, code and date in a single visit of one patient. I have a functioning code that transposes the operation number, code and date into variables.

 

The whole thing looks like this:

 

DATA mydata;
FORMAT OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
...... etc.
;
run;

PROC SORT DATA=mydata; 
BY ID visit OP; 
RUN;

DATA transC;
DO UNTIL (LAST.ID);
Retain ID V1OP1C V1OP1D V1OP2C etc ..;
FORMAT V1OP1D DDMMYYP10. V1OP2D DDMMYYP10. etc... ;
SET mydata; 
BY ID visit;
IF visit="1" AND OP="1" THEN DO; V1OP1C=OPcode; V1OP1D=OP_date;END;
IF visit="1" AND OP="2" THEN DO; V1OP2C=OPcode; V1OP2D=OP_date;END;
IF visit="2" etc ...
DROP visit OP OPcode OP_date;
RUN;

 

I would like to generate the variable names automatically in the IF statement so I wouldn't have to write myself all the combinations of all Visit and OP and also automatically generate the names of the new varibales.

 

It should be something like this:

The FORMAT statement:
FORMAT VX_OPYdate DDMMYYP10. etc... ;

The IF statement:
IF visit=X AND OP=Y THEN DO; VX_OPYcode=OPcode; VX_OPYdate=OP_datum; END;

I have tried it this with arrays, but it didn't work. Also my experience is limited, so any suggestions to other parts of the code are welcome 🙂

 

Thanks in forward

 

Ubai.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

For writing repeating code, the macro language provides the necessary toolset:

data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;

proc sort data=mydata; 
by ID visit OP; 
run;

%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  V&i.OP&j.C $8.
  V&i.OP&j.D DDMMYYP10.
  %end;
%end;
;
do until (last.ID);
  set mydata; 
  by ID;
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  if visit = &i and OP = "&j"
  then do;
    V&i.OP&j.C = OPcode;
    V&i.OP&j.D = OP_date;
  end;
  %end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

For writing repeating code, the macro language provides the necessary toolset:

data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;

proc sort data=mydata; 
by ID visit OP; 
run;

%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  V&i.OP&j.C $8.
  V&i.OP&j.D DDMMYYP10.
  %end;
%end;
;
do until (last.ID);
  set mydata; 
  by ID;
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  if visit = &i and OP = "&j"
  then do;
    V&i.OP&j.C = OPcode;
    V&i.OP&j.D = OP_date;
  end;
  %end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)
Ubai
Quartz | Level 8

@Kurt_Bremser

 

Dear Kurt,

 

thanks for the code. It helped alot and I did few steps afterwards. However, I was tryign to improve the code and shorten unnecessary parts. You have used the FORMAT statement in the macro. It keeps giving me the following warning:

 
WARNING: Multiple lengths were specified for the BY variable ID_F by input data sets and LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This might cause unexpected results.
 
and when I try to delete the whoe statement I get an error:
 
Statement is not valid or it is used out of proper order.
 
In a previous step I have identified and formatted the variable ID.
 
Do you explain how SAS is understanding this and why am I getting this error? Thanks in farword.
Ubai
Quartz | Level 8

@Kurt_Bremser

 

It's the code you mentioned in your answer in this thread. In the macro code, third line you used a FORMAT statement. This is the statement I meant.

Reeza
Super User

It's missing a semi-colon.

 

EDIT: nope, but I suspect that's where the change happened to create the error anyways - you likely missed a semi-colon in the conversion or it needs an extra one for some reason. 

Ubai
Quartz | Level 8

Ok, I guess I found it out.

This ID variable in my code is a character variable that includes numbers as well. That's why it does have multiple lengths.

Reeza
Super User

Use two proc transposes and merge the data sets. 

http://www.ats.ucla.edu/stat/sas/modules/ltow_transpose.htm

 

Or arrays:

http://www.ats.ucla.edu/stat/sas/modules/longtowide_data.htm

 

Also, it's rare that a wide format is more efficient than a long format. It's usually harder to calculate downstream indicators in my experience. 

 

data_null__
Jade | Level 19

To what end?

 

To make it data driven you could use PROC TRANSPOSE. 

 

DATA mydata;
   input id visit op OPcode $ OP_date:DDMMYY10.;
   FORMAT OP_date DDMMYYP10.;
   datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;;;;
   run;
proc summary nway completetypes;
   class visit op;
   output out=frame(drop=_:);
   run;
proc print;
   run;
proc print data=mydata;
   run;
data frameV / view=frameV;
   set frame mydata;
   run;
proc transpose data=frameV out=code(where=(not missing(ID))) prefix=V suffix=C delim=OP;
   by id;
   var opcode;
   id visit op;
   format visit op z2.;
   run;
proc print;
   run;

xCapture.PNG


 

Ubai
Quartz | Level 8
Thanks Kurt. It worked very well.

Reeza, thank you. I have transposed my data using all these methods to practice different techniques. However I found mine the most suitable for what I'm planning to do.

You might be correct. As I have said my data includes several rows for each patient and I would like to transpose the data to a wide format to calculate survival time to different end points in each patient. The way that the data is entered make it somewhat hard for me to write a program that do what I'm looking for without restructuring the data.

You can help a lot if you give a reference where I can learn how conduct a kaplan-meier or a cox regression over several observations for each patient.
I can post a sample of the real data if you want.
Ubai
Quartz | Level 8

For survival analysis too? I would love to learn from the scipt. I would be very grateful if I can have a look at it.

Kurt_Bremser
Super User

The macro basically looks like that:

%macro cumvars(crit);
retain
  count_&crit
  sum_&crit
;
if first.person_key
then do;
  count_&crit = 0;
  sum_&crit = 0;
end;
if critvar = "&crit"
then do;
  count_&crit + 1;
  sum_&crit + sumvalue;
end;
%mend;

This is of course just a basic template, in reality there were much more variables.

Since I work in an insurance company, the criterion was the insurance class, and we summed values like overall premium, new or changed risks in last year/last two years/last three years, claims (sum and number), and so on,

The data step then looks like

data person_base;
set claims_contracts;
by person_key;
%cumvars(cl1)
%cumvars(cl2)
%cumvars(cl3)
....
if last.person_key then output;
run;

For further automation, the macro calls could be done automatically from a list of classes.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 3256 views
  • 2 likes
  • 4 in conversation