BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Halps
Obsidian | Level 7

I have a list of people who have visited in certain years. I want to look at the years they didn't visit too up to year 15. How do I add those numbers in?

 

Have:

Study_ID   Year

    001           9  

    001           9  

    001          10

    001          12

    001          15

    002           2

    002           6

    002           8

 

Want:

Study_ID   Year

    001           9  

    001           9  

    001          10

    001          11

    001          12

    001          13

    001          14

    001          15

    002           2

    002           3

    002           4

    002           5

    002           6

    002           7

    002           8

    002           9  

    002          10

    002          11

    002          12

    002          13

    002          14

    002          15

 

1 ACCEPTED SOLUTION

Accepted Solutions
biopharma
Quartz | Level 8

A lengthier solution to the one that is already posted but perhaps mine is more easy to understand😁

 

data have;
input Study_ID  $  Year;
cards;
    001           9  
    001           9  
    001          10
    001          12
    001          15
    002           2
    002           6
    002           8
;
run ;

proc sql ;
   create table temp as 
      select study_id, min(year) as minyear
         from have 
         group by study_id 
         order by study_id
      ;
quit ;

%let maxyear = 15 ; *set this value to what you desire ;
data template ;
   set temp ;
   by study_id ;
   
   do year = minyear to &maxyear ;
      output ;
   end ;
run ;

data want ;
   merge have template ;
   by study_id year ;
   drop minyear ;
run ;

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20

Hi @A_Halps  Assuming I understand your requirement-


data have;
input Study_ID  $  Year;
cards;
    001           9  

    001           9  

    001          10

    001          12

    001          15

    002           2

    002           6

    002           8
	;

data want;
 _n_=.;
 do until(last.study_id);
  set have;
  by study_id;
  _year=year;
  if _n_<year and _n_>. then 
   do _year=_n_+1 to year;
    output;
   end;
  else output;
  _n_=year;
 end;
 if _year<15 then do _year=_year to 15;
  output;
 end;
 drop year;
 rename _year=year;
run;
proc print noobs;run;

 

biopharma
Quartz | Level 8
For 001 the first two observations are duplicates. Is that how it appears in your real data or a typo?
biopharma
Quartz | Level 8

A lengthier solution to the one that is already posted but perhaps mine is more easy to understand😁

 

data have;
input Study_ID  $  Year;
cards;
    001           9  
    001           9  
    001          10
    001          12
    001          15
    002           2
    002           6
    002           8
;
run ;

proc sql ;
   create table temp as 
      select study_id, min(year) as minyear
         from have 
         group by study_id 
         order by study_id
      ;
quit ;

%let maxyear = 15 ; *set this value to what you desire ;
data template ;
   set temp ;
   by study_id ;
   
   do year = minyear to &maxyear ;
      output ;
   end ;
run ;

data want ;
   merge have template ;
   by study_id year ;
   drop minyear ;
run ;
A_Halps
Obsidian | Level 7
Thank you! I have many more columns than just study_id and year. How do I make sure that all of the other columns keep all of the same information from the previous year?
biopharma
Quartz | Level 8
There is a neat trick that I learnt just a few days ago from @novinosrin. Go ahead and give this a try. I have added a satellite variable WEIGHT to the HAVE dataset and tested. A partial print of the final data is produced.
 
data have;
input Study_ID  $  Year weight ;
cards;
    001           9  50
    001           9  50 
    001          10  51
    001          12  52
    001          15  54 
    002           2  60
    002           6  62
    002           8  64
;
run ;

proc sql ;
   create table temp as 
      select study_id, min(year) as minyear
         from have 
         group by study_id 
         order by study_id
      ;
quit ;

%let maxyear = 15 ; *set this value to what you desire ;
data template ;
   set temp ;
   by study_id ;
   
   do year = minyear to &maxyear ;
      output ;
   end ;
run ;

data want ;
   merge have template ;
   by study_id year ;
   drop minyear ;
run ;

data real_want ;
   update want (obs=0) want ;
   by study_id ;
   output ;
run ;
Obs Study_ID Year weight12345678
001950
001950
0011051
0011151
0011252
0011352
0011452
0011554
novinosrin
Tourmaline | Level 20

Hi @biopharma  Thank you for your modesty and mention of those pleasing kind words. Very satisfying indeed as I recall an amazing encouragement/cheer from @Reeza (a selfless/amazing beautiful person)  whose words were my 1st spark in this thread-https://communities.sas.com/t5/SAS-Programming/total-visits-and-average-score-in-a-datastep/td-p/411...  Could never thank her enough and will always be indebted forever. And yes, we do disagree in our banter for sure i.e. Tea is unacceptable, coffee is better. 

 

Okay, I would like some fun here too. lol. So extending my previous post with some hash fun 🙂



data have;
input Study_ID  $  Year weight ;
cards;
    001           9  50
    001           9  51 
    001          10  51
    001          12  52
    001          15  54 
    002           2  60
    002           6  62
    002           8  64
;
run ;
%let maxyear = 15 ; *set this value to what you desire ;

data want;
 if _n_=1 then do;
  dcl hash H (dataset:'have(obs=0)',multidata:'y') ;
  h.definekey  ("year") ;
  h.definedata (all:'y') ;
  h.definedone () ;
 end;
 _n_=.;
 do _iorc_=1 by 1 until(last.study_id);
  set have;
  by study_id;
  h.add();
 end;
 do _iorc_=1 to h.num_items;
  set have(keep=study_id year);
  if _n_<year and _n_>. then 
   do year=_n_+1 to year-1;
    output;
   end;
   if h.find()=0 then h.removedup();
   output;
   _n_=year;
 end;
 if year<15 then do year=year to &maxyear;
  output;
 end;
 h.clear();
run;

 

Have a great weekend and stay safe. 

 

 

 

 

biopharma
Quartz | Level 8
Hi @novinosrin,

Neat. I have a very rudimentary understanding of it but see the endless possibilities in the ways it is used. I am studying and perhaps one of these days will be able to post a solution with it. Fascinated and yes, I am looking into the book by @hashman and @DonH.
novinosrin
Tourmaline | Level 20

" Fascinated and yes, I am looking into the book by @hashmanand @DonH."-  Smart approach.You are in the right track. SAS universe begins and ends with those two. Can't agree more!

 

 

 

A_Halps
Obsidian | Level 7
Yes, It is a duplicate because it has a more specific date (and they are
different), but I am only looking at years right now. I just dont want to
delete anything.
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2383 views
  • 2 likes
  • 3 in conversation