I have a dataset in which some variables belonging to the same subject have repeat values. I want to keep only the first one for some variables. For other variables, I want to keep all of the values even if repeated.
Here is a simplified version of the data, where I want to keep the first of Apples and Cherries, but all repeats of Bananas. (I had trouble getting this fake data to load properly so imagine that the '' cells are actually blank)
data WORK.have;
infile datalines missover delimiter=',';
input ID 1 Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,Apples,Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,Apples,Bananas,Cherries
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',Cherries
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;
Here's what I want the dataset to look like
data WORK.want;
infile datalines missover delimiter=',';
input ID 1 Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,'',Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,'',Bananas,''
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',''
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;
I tried code like this but it doesn't work because the first non-missing field is not always the first row
data want;
set have;
array onetime {2} $ Apples Cherries;
by id;
do i=1 to 2;
if first.id then onetime(i)=onetime(i);
else call missing(of onetime(i));
end;
run;
Appreciate your assistance! I am using SAS 9.4.
data WORK.have;
infile datalines missover delimiter=',' dsd;
input ID Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,Apples,Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,Apples,Bananas,Cherries
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',Cherries
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash a();
a.definekey('id','Apples');
a.definedone();
declare hash b();
b.definekey('id','Bananas');
b.definedone();
declare hash c();
c.definekey('id','Cherries');
c.definedone();
end;
set have;
if a.check()=0 then call missing(Apples); else a.add();
if b.check()=0 then call missing(Bananas); else b.add();
if c.check()=0 then call missing(Cherries); else c.add();
run;
Below should return the desired result.
data want(drop=_:);
set have;
by id;
array _flg {2} $1 _temporary_;
array _vars {2} apples cherries;
if first.id then call missing(of _flg[*]);
do _i=1 to dim(_vars);
if _flg[_i]='1' then call missing(_vars[_i]);
if not missing(_vars[_i]) then _flg[_i]='1';
end;
run;
The values in a _temporary_ array are retained.
We first check if _flg[<n>] is 1 and if it is set the related source variable to missing. Because the check that sets _flg[<n>] to 1 happens after this, the first time the source variable is populated _flg[<n>] will be missing.
Oh, I looked at the other posted solution first but yes, this works perfectly too! It's more aligned with how I was originally trying to solve it. Thank you!
data WORK.have;
infile datalines missover delimiter=',' dsd;
input ID Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,Apples,Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,Apples,Bananas,Cherries
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',Cherries
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;
data want;
if _n_=1 then do;
if 0 then set have;
declare hash a();
a.definekey('id','Apples');
a.definedone();
declare hash b();
b.definekey('id','Bananas');
b.definedone();
declare hash c();
c.definekey('id','Cherries');
c.definedone();
end;
set have;
if a.check()=0 then call missing(Apples); else a.add();
if b.check()=0 then call missing(Bananas); else b.add();
if c.check()=0 then call missing(Cherries); else c.add();
run;
This is amazing; I never would have thought of it. Works perfectly! In my case, since I do want to allow repeat of "bananas" I would simply omit the code relating to bananas.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.