topic Re: Duplicate observations with new StateID column? in SAS Programming

Duplicate observations with new StateID column?

imdickson — Wed, 31 Jul 2019 07:46:45 GMT

data have;
input Id Twin Age1 Age2 Sex1 $ Sex2 $ Cost;
infile datalines missover;
datalines;1 1 45 35 M F 10
2 0 32 .  F  20
3 0 26 .  M  30
4 1 20 21 M F .;

With the dataset above, is there a way for me to duplicate the record set with a new column called "StateID" with every 1 full set having 1 StateID. For example, first full set of data will have StateId=1. 2nd set onwards will have Stateid=2 and all the way until the 13th set which will have StateId=13.

1 full set = 34 rows.

How can i achieve that?

Re: Duplicate observations with new StateID column?

Kurt_Bremser — Wed, 31 Jul 2019 07:49:45 GMT

So you have a dataset with 34 * 13 observations, and you want to increment your new column every 34 observations?

Re: Duplicate observations with new StateID column?

imdickson — Wed, 31 Jul 2019 07:52:04 GMT

Meaning to say 1 full set of data has 34 rows.

What i want is every 34 rows of record, stateid will increase by 1.

So the first 34 rows will have StateID=1.

Second 34 rows will have StateID=2 and so forth.

Re: Duplicate observations with new StateID column?

imdickson — Wed, 31 Jul 2019 07:54:54 GMT

Currently there is only 34 rows of records in this table.

Re: Duplicate observations with new StateID column?

Kurt_Bremser — Wed, 31 Jul 2019 08:07:55 GMT

Simple do loop, and sorting afterwards:

data want;
set have;
do StateID = 1 to 13;
  output;
end;
run;

proc sort data=want;
by StateID Id;
run;

Re: Duplicate observations with new StateID column?

hashman — Wed, 31 Jul 2019 08:50:55 GMT

@imdickson:

This is perhaps a relatively efficient way since it reads the input data set once but requires sorting if you want WANT to be sorted by StateID:

data have ;                                        
  retain stateID ;                                 
  input Id Twin Age1 Age2 (Sex1 Sex2) (:$1.) Cost ;
  cards ;                                          
1 1 45 35 M F 10                                   
2 0 32  . F . 20                                   
3 0 26  . M . 30                                   
4 1 20 21 M F  .                                   
;                                                  
run ;                                              
                                                   
data want ;                                        
  retain stateID ;                                 
  set have ;                                       
  do stateID = 1 to 13 ;                           
    output ;                                       
  end ;                                            
run ;                                              
                                                   
proc sort ;                                        
  by stateID ;                                     
run ;

This is a much less efficient way since you have to re-read the input data set 12 times; but you don't have to sort:

data want ;                        
  do StateID = 1 to 13 ;           
    do p = 1 to n ;                
      set have point = p nobs = n ;
      output ;                     
    end ;                          
  end ;                            
  stop ;                           
run ;

Yet another way, probably most efficient since it reads HAVE only once and requires no sorting or means to put StateID first in the variable list:

data v / view = v ;                           
  do StateID = 1 to 13 ;                      
    output ;                                  
  end ;                                       
run ;                                         
                                              
proc sql ;                                    
  create table want as select * from v, have ;
quit ;

It's still most efficient, even though the log says:

NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.

The note is valid; however, it doesn't testify to an inefficiency because the Cartesian product is exactly what here you're aiming to create.

Kind regards

Paul D.