BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Katie
Obsidian | Level 7

Hello,

This is the data I have:

ID_VAR1     ID_VAR2     ID_VAR3     OTHERVAR_1     OTHERVAR_2     OTHERVAR_3

1                    xxx               1x1               x                         .                         y

1                    xxx               1x1               .                         z                         y

2                    zzz               2x2               p                        u                         g

2                    zzz               3x3               r                         f                          h

3                    nnn               4x4               .                         q                         .

3                    nnn               4x4               t                         q                         w

This is the data I would like:

ID_VAR1     ID_VAR2     ID_VAR3     OTHERVAR_1     OTHERVAR_2     OTHERVAR_3

1                    xxx               1x1               x                        z                         y

1                    xxx               1x1              x                        z                         y

2                    zzz               2x2               p                        u                         g

2                    zzz               3x3               r                         f                          h

3                    nnn               4x4               t                         q                         w

3                    nnn               4x4               t                         q                         w

           

So basically per unique combination of ID variables, I need to fill up and down other variables.  I have quite a few variables to do this on and would like the most efficient solution.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

Efficient to write.

 

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
   cards; 
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run; 
proc print; 
   run; 
data down;
   update un(obs=0) un;
   by ID_:;
   array f first:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   run; 
proc sort data=down;
   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0) down;
   by id_:;
   output; 
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run; 

Capture.PNG 

View solution in original post

19 REPLIES 19
data_null__
Jade | Level 19

Efficient to write or efficient to run?

Katie
Obsidian | Level 7

I would prefer something that is efficient to write.  As of right now I can only think of doing it variable by variable.  When I would rather have some code cycle through all the variables in one step.

data_null__
Jade | Level 19

Efficient to write.

 

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
   cards; 
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run; 
proc print; 
   run; 
data down;
   update un(obs=0) un;
   by ID_:;
   array f first:;
   if f[dim(f)] then index=0; 
   index+1; 
   output; 
   run; 
proc sort data=down;
   key id_:;
   key index / descending; 
   run; 
data up;
   update down(obs=0) down;
   by id_:;
   output; 
   run; 
proc sort data=up;
   key id_: index;
   run; 
proc print; 
   run; 

Capture.PNG 

data_null__
Jade | Level 19

I should add that using an array of first variables will not work if there are other variables in the data that have first as the first part of there names.  This can be avoided with the addion of validvarname=any as long as there are no names that begin with FIRST. (first dot).  For example.

data un;
   input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (firstvar_1-firstvar_3)(:$1.);
   cards;
1 xxx 1x1 x . y
1 xxx 1x1 . z y
2 zzz 2x2 p u g
2 zzz 3x3 r f h
3 nnn 4x4 . q .
3 nnn 4x4 t q w
;;;;
   run;
proc print;
  
run;
options validvarname=any;
data
down;
   update un(obs=0) un;
   by ID_:;
   array f
  • 'first.'n:;
       if f[dim(f)] then index=0;
       index+
    1;
      
    output;
      
    run;

    options validvarname=v7;

    proc sort data=down;

       key id_:;
       key index / descending;
      
    run;
    data up;
       update down(obs=0) down;
       by id_:;
       output;
      
    run;
    proc sort data=up;
       key id_: index;
       run;
    proc print;
      
    run;
    Peter_C
    Rhodochrosite | Level 12

    thank you

     

    That's a really neat way of generalising code to identify the last of (undefined) number of names in a list.

    It's not hard to accept the restriction, reserving names beginning "first" for the special by-group indicators, without even needing validvarname=any

    .

    Peter

    Katie
    Obsidian | Level 7

    I am trying to adapt the code you have previously shown me to only fill down a couple of variables, however I cannot figure out how to make it work.  Can you help?

    This is what I have so far:

    data down;

              update try(obs=0) try;

         by CPI;

         array f[2] Race Gender first:;

         if f[dim(f)] then index = 0;

         indext+1;

         ouput;

    run;

    The rest of the code I can do, it is just setting up the array to only update those two variables.

    Thanks!

    Katie

    data_null__
    Jade | Level 19

    Consider this modification to my original program with the addition of othervar_4-othervar_6  specifically the CALL MISSING after the output statement.  This effectively turns off the LOCF action for these variables.

    data un;
       input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);
       cards;
    1 xxx 1x1 x . y x . y
    1 xxx 1x1 . z y . z y
    2 zzz 2x2 p u g p u g
    2 zzz 3x3 r f h r f h
    3 nnn 4x4 . q . . q .
    3 nnn 4x4 t q w t q w
    ;;;;
       run;
    proc print;
      
    run;
    data down;
       update un(obs=0) un;
       by ID_:;
       array f
  • first:;
  •    if f[dim(f)] then index=0;
       index+
    1;
      
    output;
      
    call missing(of othervar_4-othervar_6);
       run;
    proc sort data=down;
       key id_:;
       key index / descending;
      
    run;
    data up;
       update down(obs=0) down;
       by id_:;
       output;
      
    call missing(of othervar_4-othervar_6);
       run;
    proc sort data=up;
       key id_: index;
       run;
    proc print;
      
    run;
    Katie
    Obsidian | Level 7

    Thank you so much for your reply.  What I'm finding is the variables in the Call statement are the ones NOT getting filled, while the others are.  Is there a way to specify only the ones you want to be filled?

    data_null__
    Jade | Level 19

    See if this is more suitable.  You create two macro the BY variables and the LOCF variables and drop and keep them as follows.  Notice the variables that are not keys or LOCF now come from SET statement. 

    data un;
       input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);
       cards;
    1 xxx 1x1 x . y x . y
    1 xxx 1x1 . z y . z y
    2 zzz 2x2 p u g p u g
    2 zzz 3x3 r f h r f h
    3 nnn 4x4 . q . . q .
    3 nnn 4x4 t q w t q w
    ;;;;
       run;
    proc print;
      
    run;
    %let by=ID_:;
    %let
    locf=othervar_1-othervar_3;
    data down;
       update un(obs=0 keep=&by) un(keep=&by &locf);
       by ID_:;
       set un(drop=&by &locf);
       array f
  • first:;
  •    if f[dim(f)] then index=0;
       index+
    1;
      
    output;
      
    run;
    proc sort data=down;
       key id_:;
       key index / descending;
      
    run;
    data up;
       update down(obs=0 keep=&by) down(keep=&by &locf);
       by id_:;
       set down(drop=&by &locf);
       output;
      
    run;
    proc sort data=up;
       key id_: index;
       run;
    proc print;
      
    run;
    Ksharp
    Super User

    Null,

    Why not DOW skill ?

    data un;
       input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_3)(:$1.);
       cards; 
    1 xxx 1x1 x . y
    1 xxx 1x1 . z y
    2 zzz 2x2 p u g
    2 zzz 3x3 r f h
    3 nnn 4x4 . q .
    3 nnn 4x4 t q w
    ;;;;
       run; 
    data want;
     length a1 a2 a3 $ 10;
     do until(last.ID_VAR3);
      set un;
      by ID_VAR1-ID_VAR3;
      a1=coalescec(a1,OTHERVAR_1);
      a2=coalescec(a2,OTHERVAR_2);
      a3=coalescec(a3,OTHERVAR_3);
     end;
     do until(last.ID_VAR3);
      set un;
      by ID_VAR1-ID_VAR3;
      OTHERVAR_1=a1;
      OTHERVAR_2=a2;
      OTHERVAR_3=a3;
      output;
     end;
     drop a1 a2 a3;
    run;
    
    
    
    

    Xia Keshan

    data_null__
    Jade | Level 19

    Because it doesn't work.  Add some observations so that the data is more realistic and you will see.

    Ksharp
    Super User

    Null,

    With all due respect , I agree with you . OP should add some data to explain his question clearer .

    Message was edited by: xia keshan

    Haikuo
    Onyx | Level 15

    If you only have less or equal two rows per (ID_VAR1 ID_VAR2 ID_VAR3), then PROC SQL and some unconventional usage of Max() function could be put into your advantage here: (Raw input was stolen from _null_'s post)

    data un;

    input (ID_VAR1-ID_VAR3)(:1. :$3. :$3.) (OTHERVAR_1-OTHERVAR_6)(:$1.);

    cards;

    1 xxx 1x1 x . y x . y

    1 xxx 1x1 . z y . z y

    2 zzz 2x2 p u g p u g

    2 zzz 3x3 r f h r f h

    3 nnn 4x4 . q . . q .

    3 nnn 4x4 t q w t q w

    ;;;;

    run;

    proc sql;

    select CAT('COALESCEC(',name,',', 'MAX(',NAME,')',') AS ',NAME) into :names SEPARATED BY ' , ' from DICTIONARY.COLUMNS

             WHERE LIBNAME='WORK'

               AND MEMNAME='UN'

               AND NAME NOT IN ('ID_VAR1' 'ID_VAR3' 'ID_VAR2')

    ;QUIT;

    proc sql;

    create table want as

            select  ID_VAR1, ID_VAR2, ID_VAR3 ,&NAMES FROM UN

            GROUP BY ID_VAR1, ID_VAR2, ID_VAR3

         ;

         QUIT;

    Haikuo

    Katie
    Obsidian | Level 7

    Here is an example of what my data looks like:

    Data Have -

    CPI EnrollDate VisitDate BP1    Race     Gender

    123 1/1/2001   1/1/2001  120/80  White     Female

    123 1/1/2001   2/1/2001  119/85 

    123 9/9/2009   9/9/2009  115/75  White

    123 9/9/2009   10/9/2009 

    456 2/2/2002   2/2/2002  118/85  Black      Male

    456 2/2/2002   3/2/2002  110/82  Black      Male

    789 3/3/2003   3/3/2003  135/81  Hispanic

    789 3/3/2003   4/3/2003  132/83  Hispanic   Male

    789 3/3/2003   5/3/2003                Male

    789 3/3/2003   6/3/2003  128/81 

    Data Want -

    CPI EnrollDate VisitDate  BP1      Race         Gender

    123 1/1/2001      1/1/2001 120/80    White         Female

    123 1/1/2001      2/1/2001 119/85    White         Female

    123 9/9/2009     9/9/2009  115/75    White         Female

    123 9/9/2009     10/9/2009               White         Female

    456 2/2/2002      2/2/2002 118/85    Black         Male

    456 2/2/2002      3/2/2002 110/82    Black         Male

    789 3/3/2003      3/3/2003 135/81    Hispanic    Male

    789 3/3/2003      4/3/2003 132/83    Hispanic    Male

    789 3/3/2003      5/3/2003                Hispanic    Male

    789 3/3/2003      6/3/2003 128/81   Hispanic    Male

    hackathon24-white-horiz.png

    The 2025 SAS Hackathon has begun!

    It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

    Latest Updates

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    SAS Training: Just a Click Away

     Ready to level-up your skills? Choose your own adventure.

    Browse our catalog!

    Discussion stats
    • 19 replies
    • 7313 views
    • 4 likes
    • 7 in conversation