SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Automatic Generation of specific variable names in a DATA procedure

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Automatic Generation of specific variable names in a DATA procedure

[ Edited ]

I am transposing a huge dataset of patients' clinical data. Each row contains the information of an operation number, code and date in a single visit of one patient. I have a functioning code that transposes the operation number, code and date into variables.

 

The whole thing looks like this:

 

DATA mydata;
FORMAT OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
...... etc.
;
run;

PROC SORT DATA=mydata; 
BY ID visit OP; 
RUN;

DATA transC;
DO UNTIL (LAST.ID);
Retain ID V1OP1C V1OP1D V1OP2C etc ..;
FORMAT V1OP1D DDMMYYP10. V1OP2D DDMMYYP10. etc... ;
SET mydata; 
BY ID visit;
IF visit="1" AND OP="1" THEN DO; V1OP1C=OPcode; V1OP1D=OP_date;END;
IF visit="1" AND OP="2" THEN DO; V1OP2C=OPcode; V1OP2D=OP_date;END;
IF visit="2" etc ...
DROP visit OP OPcode OP_date;
RUN;

 

I would like to generate the variable names automatically in the IF statement so I wouldn't have to write myself all the combinations of all Visit and OP and also automatically generate the names of the new varibales.

 

It should be something like this:

The FORMAT statement:
FORMAT VX_OPYdate DDMMYYP10. etc... ;

The IF statement:
IF visit=X AND OP=Y THEN DO; VX_OPYcode=OPcode; VX_OPYdate=OP_datum; END;

I have tried it this with arrays, but it didn't work. Also my experience is limited, so any suggestions to other parts of the code are welcome Smiley Happy

 

Thanks in forward

 

Ubai.


Accepted Solutions
Solution
‎01-10-2017 10:08 AM
Super User
Posts: 6,927

Re: Automatic Generation of specific variable names in a DATA procedure

For writing repeating code, the macro language provides the necessary toolset:

data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;

proc sort data=mydata; 
by ID visit OP; 
run;

%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  V&i.OP&j.C $8.
  V&i.OP&j.D DDMMYYP10.
  %end;
%end;
;
do until (last.ID);
  set mydata; 
  by ID;
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  if visit = &i and OP = "&j"
  then do;
    V&i.OP&j.C = OPcode;
    V&i.OP&j.D = OP_date;
  end;
  %end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎01-10-2017 10:08 AM
Super User
Posts: 6,927

Re: Automatic Generation of specific variable names in a DATA procedure

For writing repeating code, the macro language provides the necessary toolset:

data mydata;
format OP_date DDMMYYP10.;
input id visit op$ OPcode$ OP_date DDMMYY10.;
datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;
run;

proc sort data=mydata; 
by ID visit OP; 
run;

%macro repeat_code(max_v,max_o);
data transC;
format ID 10.
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  V&i.OP&j.C $8.
  V&i.OP&j.D DDMMYYP10.
  %end;
%end;
;
do until (last.ID);
  set mydata; 
  by ID;
%do i = 1 %to &max_v;
  %do j = 1 %to &max_o;
  if visit = &i and OP = "&j"
  then do;
    V&i.OP&j.C = OPcode;
    V&i.OP&j.D = OP_date;
  end;
  %end;
%end;
end;
drop visit OP OPcode OP_date;
run;
%mend;
%repeat_code(20,14)
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 9

Re: Automatic Generation of specific variable names in a DATA procedure

@KurtBremser

 

Dear Kurt,

 

thanks for the code. It helped alot and I did few steps afterwards. However, I was tryign to improve the code and shorten unnecessary parts. You have used the FORMAT statement in the macro. It keeps giving me the following warning:

 
WARNING: Multiple lengths were specified for the BY variable ID_F by input data sets and LENGTH, FORMAT, INFORMAT, or ATTRIB statements. This might cause unexpected results.
 
and when I try to delete the whoe statement I get an error:
 
Statement is not valid or it is used out of proper order.
 
In a previous step I have identified and formatted the variable ID.
 
Do you explain how SAS is understanding this and why am I getting this error? Thanks in farword.
Super User
Posts: 6,927

Re: Automatic Generation of specific variable names in a DATA procedure

You will have to post the code for inspection.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 9

Re: Automatic Generation of specific variable names in a DATA procedure

@KurtBremser

 

It's the code you mentioned in your answer in this thread. In the macro code, third line you used a FORMAT statement. This is the statement I meant.

Super User
Posts: 17,730

Re: Automatic Generation of specific variable names in a DATA procedure

[ Edited ]

It's missing a semi-colon.

 

EDIT: nope, but I suspect that's where the change happened to create the error anyways - you likely missed a semi-colon in the conversion or it needs an extra one for some reason. 

Occasional Contributor
Posts: 9

Re: Automatic Generation of specific variable names in a DATA procedure

Ok, I guess I found it out.

This ID variable in my code is a character variable that includes numbers as well. That's why it does have multiple lengths.

Super User
Posts: 17,730

Re: Automatic Generation of specific variable names in a DATA procedure

Use two proc transposes and merge the data sets. 

http://www.ats.ucla.edu/stat/sas/modules/ltow_transpose.htm

 

Or arrays:

http://www.ats.ucla.edu/stat/sas/modules/longtowide_data.htm

 

Also, it's rare that a wide format is more efficient than a long format. It's usually harder to calculate downstream indicators in my experience. 

 

Respected Advisor
Posts: 3,777

Re: Automatic Generation of specific variable names in a DATA procedure

To what end?

 

To make it data driven you could use PROC TRANSPOSE. 

 

DATA mydata;
   input id visit op OPcode $ OP_date:DDMMYY10.;
   FORMAT OP_date DDMMYYP10.;
   datalines;
580 20 14 x5-135 13.04.2011
1663 1 1 x5-124 17.12.2004
1663 1 2 x5-344 17.12.2004
1663 2 1 x5-424 .
;;;;
   run;
proc summary nway completetypes;
   class visit op;
   output out=frame(drop=_:);
   run;
proc print;
   run;
proc print data=mydata;
   run;
data frameV / view=frameV;
   set frame mydata;
   run;
proc transpose data=frameV out=code(where=(not missing(ID))) prefix=V suffix=C delim=OP;
   by id;
   var opcode;
   id visit op;
   format visit op z2.;
   run;
proc print;
   run;

xCapture.PNG


 

Occasional Contributor
Posts: 9

Re: Automatic Generation of specific variable names in a DATA procedure

Thanks Kurt. It worked very well.

Reeza, thank you. I have transposed my data using all these methods to practice different techniques. However I found mine the most suitable for what I'm planning to do.

You might be correct. As I have said my data includes several rows for each patient and I would like to transpose the data to a wide format to calculate survival time to different end points in each patient. The way that the data is entered make it somewhat hard for me to write a program that do what I'm looking for without restructuring the data.

You can help a lot if you give a reference where I can learn how conduct a kaplan-meier or a cox regression over several observations for each patient.
I can post a sample of the real data if you want.
Super User
Posts: 6,927

Re: Automatic Generation of specific variable names in a DATA procedure

I remember that I had to do something quite similar when preparing data for Enterprise Miner.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 9

Re: Automatic Generation of specific variable names in a DATA procedure

For survival analysis too? I would love to learn from the scipt. I would be very grateful if I can have a look at it.

Super User
Posts: 6,927

Re: Automatic Generation of specific variable names in a DATA procedure

[ Edited ]

The macro basically looks like that:

%macro cumvars(crit);
retain
  count_&crit
  sum_&crit
;
if first.person_key
then do;
  count_&crit = 0;
  sum_&crit = 0;
end;
if critvar = "&crit"
then do;
  count_&crit + 1;
  sum_&crit + sumvalue;
end;
%mend;

This is of course just a basic template, in reality there were much more variables.

Since I work in an insurance company, the criterion was the insurance class, and we summed values like overall premium, new or changed risks in last year/last two years/last three years, claims (sum and number), and so on,

The data step then looks like

data person_base;
set claims_contracts;
by person_key;
%cumvars(cl1)
%cumvars(cl2)
%cumvars(cl3)
....
if last.person_key then output;
run;

For further automation, the macro calls could be done automatically from a list of classes.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 418 views
  • 2 likes
  • 4 in conversation