BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maxjiang6999
Calcite | Level 5

Hi there,


I have two data sets: have and have1 (have1 is relatively unimportant so I didn't list it below. First I want to create a series of new variables (var_1 - var_42, however here I only listed var1-var4) to Have and to define var_i = date if count = i (please see the ideal output Want).

 

The code (listed right below) that I ran gave me a result but it's not the one that I want (listed at the bottom). Also is there an easier way than using %let to define a series of variables in a similar naming convention?

 

data want; set have;

array var (42) var_1 - var_42;

do i= 1 to 42;

if count = i then var(i) = date;

end;

format var_1 - var date9.;

drop i;

run;

 

%let var_i = var_1,var_2,var_3,var_4,var_5,var_6,var_7,var_8,var_9,var_10,var_11,var_12,var_13,var_14,var_15,var_16,var_17,var_18,var_19,

var_20,var_21,var_22,var_23,var_24,var_25,var_26,var_27,var_28,var_29,var_30,var_31,var_32,var_33,var_34,var_35,var_36,var_37,var_38,var_39,

var_40,var_41,var_42;

 

proc sql; create table want_1 as select a.*, b.&var_i from have1 a left join want b on a.id = b.id and a.year = b.year; quit;

 

 

Have

IDYearDateCount
120065/31/20061
120076/7/20071
120087/8/20081
120097/13/20091
120097/8/20082
2200912/23/20081
220091/30/20092
220099/30/20093
2200912/15/20084

 

Want

IDYearDateCountvar_1var_2var_3var_4
120065/31/200615/31/2006   
120076/7/200716/7/2007   
120087/8/200817/8/2008   
120097/13/200917/13/20097/8/2008  
2200912/23/2008112/23/20081/30/20099/30/200912/15/2008

 

 

IDYearDateCountvar_1var_2var_3var_4
120065/31/200615/31/2006   
120076/7/200716/7/2007   
120087/8/200817/8/2008   
120097/13/200917/13/2009   
120097/8/20082 7/8/2008  
2200912/23/2008112/23/2008   
220091/30/20092 1/30/2009  
220099/30/20093  9/30/2009 
2200912/15/20084   12/15/2008
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is some code:

 

data have;
input ID Year Date :mmddyy.;
format date yymmdd10.;
datalines;
1	2006	5/31/2006	
1	2007	6/7/2007	
1	2008	7/8/2008	
1	2009	7/13/2009	
1	2009	7/8/2008	
2	2009	12/23/2008	
2	2009	1/30/2009	
2	2009	9/30/2009	
2	2009	12/15/2008	
;

data want;
array v var_1-var_4;
format var_: yymmdd10.;
do i = 1 by 1 until (last.year);
    set have; by id year;
    if i <= dim(v) then v{i} = date;
    end;
drop i date;
run;

proc print data=want noobs; var id year var_:; run;
         ID    Year         var_1         var_2         var_3         var_4

          1    2006    2006-05-31             .             .             .
          1    2007    2007-06-07             .             .             .
          1    2008    2008-07-08             .             .             .
          1    2009    2009-07-13    2008-07-08             .             .
          2    2009    2008-12-23    2009-01-30    2009-09-30    2008-12-15

As for the SQL part of your question... I don't understand what it's about.

PG

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

Here is some code:

 

data have;
input ID Year Date :mmddyy.;
format date yymmdd10.;
datalines;
1	2006	5/31/2006	
1	2007	6/7/2007	
1	2008	7/8/2008	
1	2009	7/13/2009	
1	2009	7/8/2008	
2	2009	12/23/2008	
2	2009	1/30/2009	
2	2009	9/30/2009	
2	2009	12/15/2008	
;

data want;
array v var_1-var_4;
format var_: yymmdd10.;
do i = 1 by 1 until (last.year);
    set have; by id year;
    if i <= dim(v) then v{i} = date;
    end;
drop i date;
run;

proc print data=want noobs; var id year var_:; run;
         ID    Year         var_1         var_2         var_3         var_4

          1    2006    2006-05-31             .             .             .
          1    2007    2007-06-07             .             .             .
          1    2008    2008-07-08             .             .             .
          1    2009    2009-07-13    2008-07-08             .             .
          2    2009    2008-12-23    2009-01-30    2009-09-30    2008-12-15

As for the SQL part of your question... I don't understand what it's about.

PG
maxjiang6999
Calcite | Level 5

Thank you so much. The code works well but the weird thing is why those var_ are added to the front of the table (not stack to the last column of the table). 

I have an additional question: the reason why I set dim(var) to be 4 is I know the max(count) = 4. What if I don't know the max(count), is there a better way to do this?

Regarding SQL, I was trying to add those new var_ to another table. But you've seen when I have many var_, my code will be very tedious. So I'm wondering if there's a more efficient way to do it.

Reeza
Super User
The variables are added to the front of the data set because they're assigned before the SET statement.
Kurt_Bremser
Super User

Just as an aside: this code

do i= 1 to 42;
if count = i then var(i) = date;
end;

can be made much more efficient:

if 1 <= count <= 42 then var{count} = date;
hashman
Ammonite | Level 13

@maxjiang6999 :

Normally, I'd to this using the DATA step and arrays (just as @PGStats has done) or the hash object, as it is simpler for me to construct such a program than to recall how to code proc TRANSPOSE to attain the same goal. In this case, though, the latter is so straightforward that it quickly succumbs to trial and error:

data have ;                    
  input id year date :mmddyy. ;
  format date yymmdd10. ;      
  cards ;                      
1  2006  05/31/2006            
1  2007  06/07/2007            
1  2008  07/08/2008            
1  2009  07/13/2009            
1  2009  07/08/2008            
2  2009  12/23/2008            
2  2009  01/30/2009            
2  2009  09/30/2009            
2  2009  12/15/2008            
;                              
run ;                          
                               
proc transpose                 
  data   = have                
  out    = want (drop = _:)    
  prefix = var_                
  ;                            
  by id year ;                 
  var date ;                   
run ;                          

Note that I don't understand the logical coherence between the values of the variable COUNT in the input and output files; you'll have to work it out yourself if it's really needed.

 

Kind regards

Paul D.

maxjiang6999
Calcite | Level 5

Hi Paul,

 

Thank you very much. PROC TRANSPOSE works too. 

So count is a variable that I constructed, indicating how many events happened within the same year. So logically, COUNT should equal to the i of var_i. Essentially, i indicates the ith event within the year. Does this make sense to you?


Thanks,

Max

hashman
Ammonite | Level 13

@maxjiang6999 : It does now, thanks.

Kurt_Bremser
Super User

For your want dataset, a proc transpose and merge should do;

proc transpose data=have out=trans prefix=var_;
var date;
by id year;
id count;

data want;
merge
  have
  trans
;
by id year;
if first.year;
run;
maxjiang6999
Calcite | Level 5

Hi there, thanks! Merge step does save me lots of time!

SwissC
Obsidian | Level 7

Regarding your SQL I think I understand what you need

 

PROC SQL;
  CREATE TABLE want_1 (drop=year1) AS SELECT
  a.*, b.*
  FROM have_1 a
  LEFT JOIN want (keep=year1 var_: rename=(year=year1)) b
    ON a.year=b.year1;
QUIT;
Reeza
Super User
I have to ask why? Why do you think this format is useful and what are you planning to do with it, because it seems like a really weird data structure to me.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1959 views
  • 4 likes
  • 6 in conversation