BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ubai
Quartz | Level 8

I am transposing a huge dataset of patients' clinical data. Each row contains the information of an operation number, code and date in a single visit of one patient. I have a functioning code that transposes the operation number, code and date into variables.

 

The whole thing looks like this:

 

DATA mydata;
FORMAT OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
...... etc.
;
run;

PROC SORT DATA=mydata; 
BY ID visit OP; 
RUN;

DATA transC;
DO UNTIL (LAST.ID);
Retain ID V1OP1C V1OP1D V1OP2C etc ..;
FORMAT V1OP1D DDMMYYP10. V1OP2D DDMMYYP10. etc... ;
SET mydata; 
BY ID visit;
IF visit="1" AND OP="1" THEN DO; V1OP1C=OPcode; V1OP1D=OP_date;END;
IF visit="1" AND OP="2" THEN DO; V1OP2C=OPcode; V1OP2D=OP_date;END;
IF visit="2" etc ...
DROP visit OP OPcode OP_date;
RUN;

 

I would like to generate the variable names automatically in the IF statement so I wouldn't have to write myself all the combinations of all Visit and OP and also automatically generate the names of the new varibales.

 

It should be something like this:

The FORMAT statement:
FORMAT VX_OPYdate DDMMYYP10. etc... ;

The IF statement:
IF visit=X AND OP=Y THEN DO; VX_OPYcode=OPcode; VX_OPYdate=OP_datum; END;

I have tried it this with arrays, but it didn't work. Also my experience is limited, so any suggestions to other parts of the code are welcome 🙂

 

Thanks in forward

 

Ubai.

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

For writing repeating code, the macro language provides the necessary toolset:

data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;

proc sort data=mydata; 
by ID visit OP; 
run;

%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  V&i.OP&j.C $8.
  V&i.OP&j.D DDMMYYP10.
  %end;
%end;
;
do until (last.ID);
  set mydata; 
  by ID;
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  if visit = &i and OP = "&j"
  then do;
    V&i.OP&j.C = OPcode;
    V&i.OP&j.D = OP_date;
  end;
  %end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)

View solution in original post

12 REPLIES 12
Kurt_Bremser
Super User

For writing repeating code, the macro language provides the necessary toolset:

data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;

proc sort data=mydata; 
by ID visit OP; 
run;

%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  V&i.OP&j.C $8.
  V&i.OP&j.D DDMMYYP10.
  %end;
%end;
;
do until (last.ID);
  set mydata; 
  by ID;
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  if visit = &i and OP = "&j"
  then do;
    V&i.OP&j.C = OPcode;
    V&i.OP&j.D = OP_date;
  end;
  %end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)
Ubai
Quartz | Level 8

@Kurt_Bremser

 

Dear Kurt,

 

thanks for the code. It helped alot and I did few steps afterwards. However, I was tryign to improve the code and shorten unnecessary parts. You have used the FORMAT statement in the macro. It keeps giving me the following warning:

 
WARNING: Multiple lengths were specified for the BY variable ID_F by input data sets and LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This might cause unexpected results.
 
and when I try to delete the whoe statement I get an error:
 
Statement is not valid or it is used out of proper order.
 
In a previous step I have identified and formatted the variable ID.
 
Do you explain how SAS is understanding this and why am I getting this error? Thanks in farword.
Ubai
Quartz | Level 8

@Kurt_Bremser

 

It's the code you mentioned in your answer in this thread. In the macro code, third line you used a FORMAT statement. This is the statement I meant.

Reeza
Super User

It's missing a semi-colon.

 

EDIT: nope, but I suspect that's where the change happened to create the error anyways - you likely missed a semi-colon in the conversion or it needs an extra one for some reason. 

Ubai
Quartz | Level 8

Ok, I guess I found it out.

This ID variable in my code is a character variable that includes numbers as well. That's why it does have multiple lengths.

Reeza
Super User

Use two proc transposes and merge the data sets. 

http://www.ats.ucla.edu/stat/sas/modules/ltow_transpose.htm

 

Or arrays:

http://www.ats.ucla.edu/stat/sas/modules/longtowide_data.htm

 

Also, it's rare that a wide format is more efficient than a long format. It's usually harder to calculate downstream indicators in my experience. 

 

data_null__
Jade | Level 19

To what end?

 

To make it data driven you could use PROC TRANSPOSE. 

 

DATA mydata;
   input id visit op OPcode $ OP_date:DDMMYY10.;
   FORMAT OP_date DDMMYYP10.;
   datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;;;;
   run;
proc summary nway completetypes;
   class visit op;
   output out=frame(drop=_:);
   run;
proc print;
   run;
proc print data=mydata;
   run;
data frameV / view=frameV;
   set frame mydata;
   run;
proc transpose data=frameV out=code(where=(not missing(ID))) prefix=V suffix=C delim=OP;
   by id;
   var opcode;
   id visit op;
   format visit op z2.;
   run;
proc print;
   run;

xCapture.PNG


 

Ubai
Quartz | Level 8
Thanks Kurt. It worked very well.

Reeza, thank you. I have transposed my data using all these methods to practice different techniques. However I found mine the most suitable for what I'm planning to do.

You might be correct. As I have said my data includes several rows for each patient and I would like to transpose the data to a wide format to calculate survival time to different end points in each patient. The way that the data is entered make it somewhat hard for me to write a program that do what I'm looking for without restructuring the data.

You can help a lot if you give a reference where I can learn how conduct a kaplan-meier or a cox regression over several observations for each patient.
I can post a sample of the real data if you want.
Ubai
Quartz | Level 8

For survival analysis too? I would love to learn from the scipt. I would be very grateful if I can have a look at it.

Kurt_Bremser
Super User

The macro basically looks like that:

%macro cumvars(crit);
retain
  count_&crit
  sum_&crit
;
if first.person_key
then do;
  count_&crit = 0;
  sum_&crit = 0;
end;
if critvar = "&crit"
then do;
  count_&crit + 1;
  sum_&crit + sumvalue;
end;
%mend;

This is of course just a basic template, in reality there were much more variables.

Since I work in an insurance company, the criterion was the insurance class, and we summed values like overall premium, new or changed risks in last year/last two years/last three years, claims (sum and number), and so on,

The data step then looks like

data person_base;
set claims_contracts;
by person_key;
%cumvars(cl1)
%cumvars(cl2)
%cumvars(cl3)
....
if last.person_key then output;
run;

For further automation, the macro calls could be done automatically from a list of classes.

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 3329 views
  • 2 likes
  • 4 in conversation