BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
A_Halps
Obsidian | Level 7

I have a list of people who have visited in certain years. I want to look at the years they didn't visit too up to year 15. How do I add those numbers in?

 

Have:

Study_ID   Year

    001           9  

    001           9  

    001          10

    001          12

    001          15

    002           2

    002           6

    002           8

 

Want:

Study_ID   Year

    001           9  

    001           9  

    001          10

    001          11

    001          12

    001          13

    001          14

    001          15

    002           2

    002           3

    002           4

    002           5

    002           6

    002           7

    002           8

    002           9  

    002          10

    002          11

    002          12

    002          13

    002          14

    002          15

 

1 ACCEPTED SOLUTION

Accepted Solutions
biopharma
Quartz | Level 8

A lengthier solution to the one that is already posted but perhaps mine is more easy to understand😁

 

data have;
input Study_ID  $  Year;
cards;
    001           9  
    001           9  
    001          10
    001          12
    001          15
    002           2
    002           6
    002           8
;
run ;

proc sql ;
   create table temp as 
      select study_id, min(year) as minyear
         from have 
         group by study_id 
         order by study_id
      ;
quit ;

%let maxyear = 15 ; *set this value to what you desire ;
data template ;
   set temp ;
   by study_id ;
   
   do year = minyear to &maxyear ;
      output ;
   end ;
run ;

data want ;
   merge have template ;
   by study_id year ;
   drop minyear ;
run ;

View solution in original post

9 REPLIES 9
novinosrin
Tourmaline | Level 20

Hi @A_Halps  Assuming I understand your requirement-


data have;
input Study_ID  $  Year;
cards;
    001           9  

    001           9  

    001          10

    001          12

    001          15

    002           2

    002           6

    002           8
	;

data want;
 _n_=.;
 do until(last.study_id);
  set have;
  by study_id;
  _year=year;
  if _n_<year and _n_>. then 
   do _year=_n_+1 to year;
    output;
   end;
  else output;
  _n_=year;
 end;
 if _year<15 then do _year=_year to 15;
  output;
 end;
 drop year;
 rename _year=year;
run;
proc print noobs;run;

 

biopharma
Quartz | Level 8
For 001 the first two observations are duplicates. Is that how it appears in your real data or a typo?
biopharma
Quartz | Level 8

A lengthier solution to the one that is already posted but perhaps mine is more easy to understand😁

 

data have;
input Study_ID  $  Year;
cards;
    001           9  
    001           9  
    001          10
    001          12
    001          15
    002           2
    002           6
    002           8
;
run ;

proc sql ;
   create table temp as 
      select study_id, min(year) as minyear
         from have 
         group by study_id 
         order by study_id
      ;
quit ;

%let maxyear = 15 ; *set this value to what you desire ;
data template ;
   set temp ;
   by study_id ;
   
   do year = minyear to &maxyear ;
      output ;
   end ;
run ;

data want ;
   merge have template ;
   by study_id year ;
   drop minyear ;
run ;
A_Halps
Obsidian | Level 7
Thank you! I have many more columns than just study_id and year. How do I make sure that all of the other columns keep all of the same information from the previous year?
biopharma
Quartz | Level 8
There is a neat trick that I learnt just a few days ago from @novinosrin. Go ahead and give this a try. I have added a satellite variable WEIGHT to the HAVE dataset and tested. A partial print of the final data is produced.
 
data have;
input Study_ID  $  Year weight ;
cards;
    001           9  50
    001           9  50 
    001          10  51
    001          12  52
    001          15  54 
    002           2  60
    002           6  62
    002           8  64
;
run ;

proc sql ;
   create table temp as 
      select study_id, min(year) as minyear
         from have 
         group by study_id 
         order by study_id
      ;
quit ;

%let maxyear = 15 ; *set this value to what you desire ;
data template ;
   set temp ;
   by study_id ;
   
   do year = minyear to &maxyear ;
      output ;
   end ;
run ;

data want ;
   merge have template ;
   by study_id year ;
   drop minyear ;
run ;

data real_want ;
   update want (obs=0) want ;
   by study_id ;
   output ;
run ;
Obs Study_ID Year weight12345678
001950
001950
0011051
0011151
0011252
0011352
0011452
0011554
novinosrin
Tourmaline | Level 20

Hi @biopharma  Thank you for your modesty and mention of those pleasing kind words. Very satisfying indeed as I recall an amazing encouragement/cheer from @Reeza (a selfless/amazing beautiful person)  whose words were my 1st spark in this thread-https://communities.sas.com/t5/SAS-Programming/total-visits-and-average-score-in-a-datastep/td-p/411...  Could never thank her enough and will always be indebted forever. And yes, we do disagree in our banter for sure i.e. Tea is unacceptable, coffee is better. 

 

Okay, I would like some fun here too. lol. So extending my previous post with some hash fun 🙂



data have;
input Study_ID  $  Year weight ;
cards;
    001           9  50
    001           9  51 
    001          10  51
    001          12  52
    001          15  54 
    002           2  60
    002           6  62
    002           8  64
;
run ;
%let maxyear = 15 ; *set this value to what you desire ;

data want;
 if _n_=1 then do;
  dcl hash H (dataset:'have(obs=0)',multidata:'y') ;
  h.definekey  ("year") ;
  h.definedata (all:'y') ;
  h.definedone () ;
 end;
 _n_=.;
 do _iorc_=1 by 1 until(last.study_id);
  set have;
  by study_id;
  h.add();
 end;
 do _iorc_=1 to h.num_items;
  set have(keep=study_id year);
  if _n_<year and _n_>. then 
   do year=_n_+1 to year-1;
    output;
   end;
   if h.find()=0 then h.removedup();
   output;
   _n_=year;
 end;
 if year<15 then do year=year to &maxyear;
  output;
 end;
 h.clear();
run;

 

Have a great weekend and stay safe. 

 

 

 

 

biopharma
Quartz | Level 8
Hi @novinosrin,

Neat. I have a very rudimentary understanding of it but see the endless possibilities in the ways it is used. I am studying and perhaps one of these days will be able to post a solution with it. Fascinated and yes, I am looking into the book by @hashman and @DonH.
novinosrin
Tourmaline | Level 20

" Fascinated and yes, I am looking into the book by @hashmanand @DonH."-  Smart approach.You are in the right track. SAS universe begins and ends with those two. Can't agree more!

 

 

 

A_Halps
Obsidian | Level 7
Yes, It is a duplicate because it has a more specific date (and they are
different), but I am only looking at years right now. I just dont want to
delete anything.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 851 views
  • 2 likes
  • 3 in conversation