My data looks like this:
I'm trying to make one observation for each by group with the values from the variables that are not missing. For instance the observation for the first group should be:
col0001 col0002 col0003 col0004 col0005 col0006 col0007 col0008 col0009 col0010 col0011 col0012
ID001002 Event1 03MAY2021 (54) 1 No No Yes No test 14 13
This is my code:
data support;
set data;
by col0001 col0002 col0003 col0004;
retain col_0006 col_0007 col_0008 col_0009 col_0010 col_0011 col_0012 col_0013 col_0014;
if first.col0004 then
do;
col_0006="";
col_0007="";
col_0008="";
col_0009="";
col_0010="";
col_0011="";
col_0012="";
col_0013="";
col_0014="";
end;
if not missing (col0006) then col_0006= col0006;
if not missing (col0007) then col_0007= col0007;
if not missing (col0008) then col_0008= col0008;
if not missing (col0009) then col_0009= col0009;
if not missing (col0010) then col_0010= col0010;
if not missing (col0011) then col_0011= col0011;
if not missing (col0012) then col_0012= col0012;
if not missing (col0013) then col_0013= col0013;
if not missing (col0014) then col_0014= col0014;
if last.col0004 then
output;
run;
And this is the output:
What am I doing wrong?
Your source data set, at least in the first pictures has variables that are on your RETAIN statement. Example Col_0006.
As such every time the data vector is read the value read from the existing data replaces the "retained" value.
So you don't get to change any of the retained variables.
Maybe this small example will demonstrate;
data have; input x; datalines; 1 2 3 ; data want; set have; retain x newx; newx= x; x= x+6; run;
You can see that Newx has the value from the Have dataset, indicating the value of X is reset when the next record is read.
I can't comment on the code, besides requesting the log as the code looks correct. My next guess would be that those cells that look missing may not be - issue with character fields sometimes. Look at the COMPRESS() function with the s modifier to remove those 'invisible blanks'.
However, if this is your goal:
I'm trying to make one observation for each by group with the values from the variables that are not missing.
Does something like this work though, SQL functions work on character columns so it's a nice lazy way to avoid this.
proc sql;
create table want as
select col0001, col0002, col0003, col0004, max(col0005) as col0005, max(col0006) as col0006
from have
group by 1, 2, 3, 4;
quit;
Or this:
data want2;
update have(obs=0) have;
by col001 col002 col003 col004;
run;
@Datino wrote:
My data looks like this:
I'm trying to make one observation for each by group with the values from the variables that are not missing. For instance the observation for the first group should be:
col0001 col0002 col0003 col0004 col0005 col0006 col0007 col0008 col0009 col0010 col0011 col0012
ID001002 Event1 03MAY2021 (54) 1 No No Yes No test 14 13
This is my code:
data support; set data; by col0001 col0002 col0003 col0004; retain col_0006 col_0007 col_0008 col_0009 col_0010 col_0011 col_0012 col_0013 col_0014; if first.col0004 then do; col_0006=""; col_0007=""; col_0008=""; col_0009=""; col_0010=""; col_0011=""; col_0012=""; col_0013=""; col_0014=""; end; if not missing (col0006) then col_0006= col0006; if not missing (col0007) then col_0007= col0007; if not missing (col0008) then col_0008= col0008; if not missing (col0009) then col_0009= col0009; if not missing (col0010) then col_0010= col0010; if not missing (col0011) then col_0011= col0011; if not missing (col0012) then col_0012= col0012; if not missing (col0013) then col_0013= col0013; if not missing (col0014) then col_0014= col0014; if last.col0004 then output; run;
And this is the output:
What am I doing wrong?
Thanks. I'm trying to avoid using sql as a learning exercise.
Your source data set, at least in the first pictures has variables that are on your RETAIN statement. Example Col_0006.
As such every time the data vector is read the value read from the existing data replaces the "retained" value.
So you don't get to change any of the retained variables.
Maybe this small example will demonstrate;
data have; input x; datalines; 1 2 3 ; data want; set have; retain x newx; newx= x; x= x+6; run;
You can see that Newx has the value from the Have dataset, indicating the value of X is reset when the next record is read.
So, the solution is to create new variables that are not in the source data set?
As @ballardw remarked, the variables that you retain are also on the input data. To make your code work, you can do something like this:
data support;
set data(drop=col_:);
by col0001 col0002 col0003 col0004;
if 0 then set data(keep=col_:);
And then the rest of your code unchanged. This way, the lengths of the COL_ variables are set by the second SET statement (which is never executed), just a lazy way of defining them without actually reading them.
Thanks, your code worked very well, although I'm not completely sure I understand what the if statement does.
if 0 then set data(keep=col_:);
The IF 0 THEN statement is, as I said, a lazy way of defining the variables mentioned in the KEEP= option. The data are never read, but the variables still get defined.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.