BookmarkSubscribeRSS Feed
shoeGirl32
Calcite | Level 5

I have a dataset with unique id's.

I know there are multiple instances in the 2nd table of each id.

I would like to do a loop where I go through each ID in Dataset A,

Find records matching records in Dataset B, and append that data to the same row for that particular ID, in a new dataset that I created, where each set of dates found in Dataset B can be appended in one row of Dataset C.

 

I would need to incorporate a data step and a proc sql step together, but not sure how do this.

 

I want to create a dataset that has ID, statedate1, enddate1, startdate2, enddates2, etc...

 

thank you in advance

8 REPLIES 8
Reeza
Super User
If you can mock up some input data and expected output someone can help with code. Wordy questions get wordy answers, data gets code 🙂

I think an append and then sorting it out after may be a better approach, but I'll wait for some examples.
shoeGirl32
Calcite | Level 5

Ok so :

 

Data Set A: ID's

1234

12345

132456

 

Date Set B: Date Detail

ID       StartDate   Endate

1234   1/1/2019   1/20/2019

1234   2/20/2019  3/1/2019

12345 3/1/2019   3/31/2019

 

etc...

 

so i want to find EACH ID from Data Set A, that is in data set B

and create a record in a new dataset (Data Set C) that takes the ID, and then appends start Date1 enddate1 startDate2 Enddate2 across in 1 record.

 

 

Reeza
Super User
Please show your exact expected output.
shoeGirl32
Calcite | Level 5
Data Set C:

ID#1 Start1 End1 start2 End2 start3 end3
ID#2 start1 end1 start 2 end2

Reeza
Super User

Ok, I don't think this is that complicated. 

 

1. First get subset of interest

2. Transpose data

 

proc sql;
create table step1 as
select * from tableB
where ID in (select ID from tableA);
order by ID;
quit;

*transpose start_date;
proc transpose data=step1 out=step2 prefix=StartDate;
by id;
var startDate;
run;
proc transpose data=step1 out=step3 prefix=EndDate;
by id;
var EndDate;
run;

data want;
merge step2 step3; 
by ID;
run;
shoeGirl32
Calcite | Level 5

Thank you very much.  This worked very nicely.

One question though, how do i get my field names to alternate like startdate1, enddate1, start2, end2 etc instead of 

start1 start2 start3 end1 end2 end3 etc?

 

Thank you,

shoeGirl

 

hashman
Ammonite | Level 13

@shoeGirl32 :

If you had looked at my solution, you'd have realized that this is what the DATA _NULL_ step does; It first determined the maximum number of the elements in the variable list and then auto-composes it in the required order. 

 

If you are spooked by the hash object approach and prefer something more traditional, you can achieve the same as shown below. But no matter which approach you take, you will need to employ some method of auto-composing the list in the requisite variable order.

data one ;                                                         
  input id ;                                                       
  cards ;                                                          
4                                                                  
3                                                                  
1                                                                  
0                                                                  
2                                                                  
run ;                                                              
                                                                   
data two ;                                                         
  input id startDate endDate ;                                     
  cards ;                                                          
3 31 32                                                            
2 21 22                                                            
3 33 34                                                            
2 23 24                                                            
1 11 12                                                            
3 35 36                                                            
run ;                                                              
                                                                   
proc sql ;                                                         
  create table sone as select * from one order id ;                
  create table stwo as select * from two order id ;                
quit ;                                                             
                                                                   
data _null_ ;                                                      
  do q = 1 by 1 until (last.id) ;                                  
    set stwo end = z ;                                             
    by id ;                                                        
  end ;                                                            
  _iorc_ = _iorc_ max q ;                                          
  if z ;                                                           
  length s $ 32767 ;                                               
  do q = 1 to _iorc_ ;                                             
    s = catx (" ", s, cats ("startDate", q), cats ("endDate", q)) ;
  end ;                                                            
  call symputx ("s", s) ;                                          
run ;                                                              
                                                                   
data want ;                                                        
  do _n_ = 0 by 2 until (last.id) ;                                
    merge sone (in = one) stwo (in = two) ;                        
    by id ;                                                        
    if one and two ;                                               
    array dd [*] &s ;                                              
    dd [_n_ + 1] = startDate ;                                     
    dd [_n_ + 2] = endDate ;                                       
  end ;                                                            
run ;                                                              

Kind regards

Paul D. 

 

 

hashman
Ammonite | Level 13

@shoeGirl32 :

Procedurally speaking, you need to:

  1. determine the size N of the longest group by ID in data set TWO
  2. based on it, create a variable list startDate1 endDate1 ... startdateN endDateN, in this order
  3. read data set ONE and loop through the records for the matching IDs in data set TWO to populate the variable list

There many ways to execute this plan. Here's one, based purely on using the DATA step:

 

data one ;                                                         
  input id ;                                                       
  cards ;                                                          
4                                                                  
3                                                                  
1                                                                  
0                                                                  
2                                                                  
run ;                                                              
data two ;                                                         
  input id startDate endDate ;                                     
  cards ;                                                          
3 31 32                                                            
2 21 22                                                            
3 33 34                                                            
2 23 24                                                            
1 11 12                                                            
3 35 36                                                            
run ;                                                              
                                                                   
data _null_ ;                                                      
  dcl hash h () ;                                                  
  h.definekey ("id") ;                                             
  h.definedata ("q") ;                                             
  h.definedone () ;                                                
  dcl hiter hi ("h") ;                                             
  do until (z) ;                                                   
    set two end = z ;                                              
    if h.find() ne 0 then q = 1 ;                                  
    else                  q + 1 ;                                  
    h.replace() ;                                                  
  end ;                                                            
  do while (hi.next() = 0) ;                                       
    qmax = qmax max q ;                                            
  end ;                                                            
  length s $ 32767 ;                                               
  do q = 1 to qmax ;                                               
    s = catx (" ", s, cats ("startDate", q), cats ("endDate", q)) ;
  end ;                                                            
  call symputx ("s", s) ;                                          
run ;                                                              
                                                                   
data want (drop = startDate endDate q) ;                             
  if _n_ = 1 then do ;                                             
    if 0 then set two ;                                            
    dcl hash h (dataset:"two", multidata:"y") ;                    
    h.definekey ("id") ;                                           
    h.definedata ("startDate", "endDate") ;                        
    h.definedone () ;                                              
  end ;                                                            
  set one ;                                                        
  array dd [*] &s ;                                                
  do q = 0 by 2 while (h.do_over() = 0) ;                          
    dd [q + 1] = startDate ;                                       
    dd [q + 2] = endDate ;                                         
  end ;                                                            
  if q ;                                                           
run ;                                                              

Kind regards

 

Paul D.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 866 views
  • 0 likes
  • 3 in conversation