Solved: Re: Use DO Loop to create a series of variables

maxjiang6999 · Posted 08-12-2019 10:19 PM

Hi there,

I have two data sets: have and have1 (have1 is relatively unimportant so I didn't list it below. First I want to create a series of new variables (var_1 - var_42, however here I only listed var1-var4) to Have and to define var_i = date if count = i (please see the ideal output Want).

The code (listed right below) that I ran gave me a result but it's not the one that I want (listed at the bottom). Also is there an easier way than using %let to define a series of variables in a similar naming convention?

data want; set have;

array var (42) var_1 - var_42;

do i= 1 to 42;

if count = i then var(i) = date;

end;

format var_1 - var date9.;

drop i;

run;

%let var_i = var_1,var_2,var_3,var_4,var_5,var_6,var_7,var_8,var_9,var_10,var_11,var_12,var_13,var_14,var_15,var_16,var_17,var_18,var_19,

var_20,var_21,var_22,var_23,var_24,var_25,var_26,var_27,var_28,var_29,var_30,var_31,var_32,var_33,var_34,var_35,var_36,var_37,var_38,var_39,

var_40,var_41,var_42;

proc sql; create table want_1 as select a.*, b.&var_i from have1 a left join want b on a.id = b.id and a.year = b.year; quit;

Have

ID	Year	Date	Count
1	2006	5/31/2006	1
1	2007	6/7/2007	1
1	2008	7/8/2008	1
1	2009	7/13/2009	1
1	2009	7/8/2008	2
2	2009	12/23/2008	1
2	2009	1/30/2009	2
2	2009	9/30/2009	3
2	2009	12/15/2008	4

Want

ID	Year	Date	Count	var_1	var_2	var_3	var_4
1	2006	5/31/2006	1	5/31/2006
1	2007	6/7/2007	1	6/7/2007
1	2008	7/8/2008	1	7/8/2008
1	2009	7/13/2009	1	7/13/2009	7/8/2008
2	2009	12/23/2008	1	12/23/2008	1/30/2009	9/30/2009	12/15/2008

ID	Year	Date	Count	var_1	var_2	var_3	var_4
1	2006	5/31/2006	1	5/31/2006
1	2007	6/7/2007	1	6/7/2007
1	2008	7/8/2008	1	7/8/2008
1	2009	7/13/2009	1	7/13/2009
1	2009	7/8/2008	2		7/8/2008
2	2009	12/23/2008	1	12/23/2008
2	2009	1/30/2009	2		1/30/2009
2	2009	9/30/2009	3			9/30/2009
2	2009	12/15/2008	4				12/15/2008

PGStats · Posted 08-12-2019 11:41 PM

Here is some code:

data have;
input ID Year Date :mmddyy.;
format date yymmdd10.;
datalines;
1	2006	5/31/2006	
1	2007	6/7/2007	
1	2008	7/8/2008	
1	2009	7/13/2009	
1	2009	7/8/2008	
2	2009	12/23/2008	
2	2009	1/30/2009	
2	2009	9/30/2009	
2	2009	12/15/2008	
;

data want;
array v var_1-var_4;
format var_: yymmdd10.;
do i = 1 by 1 until (last.year);
    set have; by id year;
    if i <= dim(v) then v{i} = date;
    end;
drop i date;
run;

proc print data=want noobs; var id year var_:; run;

         ID    Year         var_1         var_2         var_3         var_4

          1    2006    2006-05-31             .             .             .
          1    2007    2007-06-07             .             .             .
          1    2008    2008-07-08             .             .             .
          1    2009    2009-07-13    2008-07-08             .             .
          2    2009    2008-12-23    2009-01-30    2009-09-30    2008-12-15

As for the SQL part of your question... I don't understand what it's about.

PG

View solution in original post

PGStats · Posted 08-12-2019 11:41 PM

Here is some code:

data have;
input ID Year Date :mmddyy.;
format date yymmdd10.;
datalines;
1	2006	5/31/2006	
1	2007	6/7/2007	
1	2008	7/8/2008	
1	2009	7/13/2009	
1	2009	7/8/2008	
2	2009	12/23/2008	
2	2009	1/30/2009	
2	2009	9/30/2009	
2	2009	12/15/2008	
;

data want;
array v var_1-var_4;
format var_: yymmdd10.;
do i = 1 by 1 until (last.year);
    set have; by id year;
    if i <= dim(v) then v{i} = date;
    end;
drop i date;
run;

proc print data=want noobs; var id year var_:; run;

         ID    Year         var_1         var_2         var_3         var_4

          1    2006    2006-05-31             .             .             .
          1    2007    2007-06-07             .             .             .
          1    2008    2008-07-08             .             .             .
          1    2009    2009-07-13    2008-07-08             .             .
          2    2009    2008-12-23    2009-01-30    2009-09-30    2008-12-15

As for the SQL part of your question... I don't understand what it's about.

PG

maxjiang6999 · Posted 08-13-2019 11:30 AM

Thank you so much. The code works well but the weird thing is why those var_ are added to the front of the table (not stack to the last column of the table).

I have an additional question: the reason why I set dim(var) to be 4 is I know the max(count) = 4. What if I don't know the max(count), is there a better way to do this?

Regarding SQL, I was trying to add those new var_ to another table. But you've seen when I have many var_, my code will be very tedious. So I'm wondering if there's a more efficient way to do it.

Reeza · Posted 08-13-2019 11:33 AM

The variables are added to the front of the data set because they're assigned before the SET statement.

Kurt_Bremser · Posted 08-13-2019 12:23 AM

Just as an aside: this code

do i= 1 to 42;
if count = i then var(i) = date;
end;

can be made much more efficient:

if 1 <= count <= 42 then var{count} = date;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

hashman · Posted 08-13-2019 12:24 AM

@maxjiang6999 :

Normally, I'd to this using the DATA step and arrays (just as @PGStats has done) or the hash object, as it is simpler for me to construct such a program than to recall how to code proc TRANSPOSE to attain the same goal. In this case, though, the latter is so straightforward that it quickly succumbs to trial and error:

data have ;                    
  input id year date :mmddyy. ;
  format date yymmdd10. ;      
  cards ;                      
1  2006  05/31/2006            
1  2007  06/07/2007            
1  2008  07/08/2008            
1  2009  07/13/2009            
1  2009  07/08/2008            
2  2009  12/23/2008            
2  2009  01/30/2009            
2  2009  09/30/2009            
2  2009  12/15/2008            
;                              
run ;                          
                               
proc transpose                 
  data   = have                
  out    = want (drop = _:)    
  prefix = var_                
  ;                            
  by id year ;                 
  var date ;                   
run ;

Note that I don't understand the logical coherence between the values of the variable COUNT in the input and output files; you'll have to work it out yourself if it's really needed.

Kind regards

Paul D.

maxjiang6999 · Posted 08-13-2019 11:38 AM

Hi Paul,

Thank you very much. PROC TRANSPOSE works too.

So count is a variable that I constructed, indicating how many events happened within the same year. So logically, COUNT should equal to the i of var_i. Essentially, i indicates the ith event within the year. Does this make sense to you?

Thanks,

Max

hashman · Posted 08-13-2019 12:08 PM

@maxjiang6999 : It does now, thanks.

Kurt_Bremser · Posted 08-13-2019 12:29 AM

For your want dataset, a proc transpose and merge should do;

proc transpose data=have out=trans prefix=var_;
var date;
by id year;
id count;

data want;
merge
  have
  trans
;
by id year;
if first.year;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

maxjiang6999 · Posted 08-13-2019 11:52 AM

Hi there, thanks! Merge step does save me lots of time!

SwissC · Posted 08-13-2019 05:54 AM

Regarding your SQL I think I understand what you need

PROC SQL;
  CREATE TABLE want_1 (drop=year1) AS SELECT
  a.*, b.*
  FROM have_1 a
  LEFT JOIN want (keep=year1 var_: rename=(year=year1)) b
    ON a.year=b.year1;
QUIT;

Reeza · Posted 08-13-2019 11:14 AM

I have to ask why? Why do you think this format is useful and what are you planning to do with it, because it seems like a really weird data structure to me.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away