BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
noviceuser74
Fluorite | Level 6

I have a dataset in which some variables belonging to the same subject have repeat values. I want to keep only the first one for some variables. For other variables, I want to keep all of the values even if repeated.
Here is a simplified version of the data, where I want to keep the first of Apples and Cherries, but all repeats of Bananas. (I had trouble getting this fake data to load properly so imagine that the '' cells are actually blank)

data WORK.have;
infile datalines missover delimiter=',';
input ID 1 Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,Apples,Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,Apples,Bananas,Cherries
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',Cherries
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;

Here's what I want the dataset to look like

 

 

data WORK.want;
infile datalines missover delimiter=',';
input ID 1 Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,'',Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,'',Bananas,''
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',''
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;


I tried code like this but it doesn't work because the first non-missing field is not always the first row

data want;
set have;
array onetime {2} $ Apples Cherries;
by id;
do i=1 to 2;
if first.id then onetime(i)=onetime(i);
else call missing(of onetime(i));
end;
run;

Appreciate your assistance! I am using SAS 9.4.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data WORK.have;
infile datalines missover delimiter=',' dsd;
input ID  Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,Apples,Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,Apples,Bananas,Cherries
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',Cherries
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash a();
  a.definekey('id','Apples');
  a.definedone();

  declare hash b();
  b.definekey('id','Bananas');
  b.definedone();

  declare hash c();
  c.definekey('id','Cherries');
  c.definedone();
 end;
set have;
if a.check()=0 then call missing(Apples); else a.add();
if b.check()=0 then call missing(Bananas); else b.add();
if c.check()=0 then call missing(Cherries); else c.add();

run;

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Below should return the desired result.

data want(drop=_:);
  set have;
  by id;
  array _flg  {2} $1 _temporary_;
  array _vars {2} apples cherries;
  if first.id then call missing(of _flg[*]);
  do _i=1 to dim(_vars);
    if _flg[_i]='1' then call missing(_vars[_i]);
    if not missing(_vars[_i]) then _flg[_i]='1';
  end;
run;

The values in a _temporary_ array are retained.

We first check if _flg[<n>] is 1 and if it is set the related source variable to missing. Because the check that sets _flg[<n>] to 1 happens after this, the first time the source variable is populated _flg[<n>] will be missing.

 

noviceuser74
Fluorite | Level 6

Oh, I looked at the other posted solution first but yes, this works perfectly too! It's more aligned with how I was originally trying to solve it. Thank you!

Ksharp
Super User
data WORK.have;
infile datalines missover delimiter=',' dsd;
input ID  Visit $ Apples $ Bananas $ Cherries $;
datalines;
1,Visit1,Apples,Bananas,''
1,Visit2,Apples,Bananas,''
2,Visit1,'',Bananas,Cherries
2,Visit2,Apples,Bananas,''
2,Visit3,Apples,Bananas,Cherries
3,Visit1,Apples,Bananas,''
3,Visit3,'','',Cherries
4,Visit1,Apples,Bananas,''
4,Visit2,'','',Cherries
4,Visit3,'','',Cherries
5,Visit1,Apples,Bananas,''
6,Visit1,Apples,Bananas,''
;
data want;
 if _n_=1 then do;
  if 0 then set have;
  declare hash a();
  a.definekey('id','Apples');
  a.definedone();

  declare hash b();
  b.definekey('id','Bananas');
  b.definedone();

  declare hash c();
  c.definekey('id','Cherries');
  c.definedone();
 end;
set have;
if a.check()=0 then call missing(Apples); else a.add();
if b.check()=0 then call missing(Bananas); else b.add();
if c.check()=0 then call missing(Cherries); else c.add();

run;
noviceuser74
Fluorite | Level 6

This is amazing; I never would have thought of it. Works perfectly! In my case, since I do want to allow repeat of "bananas" I would simply omit the code relating to bananas.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 328 views
  • 3 likes
  • 3 in conversation