BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JLang055
Fluorite | Level 6

I have a patient dataset with multiple observations and variable. I need to delete overlapping observations for each participant. See example below.

 

HAVE

FIDMOODANXIETYEATDEVELOP
1 1  
1  1 
11   
11   
2  1 
2 1  
2  1 
3 1  
31   

 

WANT

FIDMOODANXIETYEATDEVELOP
1 1  
1  1 
11   
2  1 
2 1  
3 1  
31   
1 ACCEPTED SOLUTION

Accepted Solutions
JLang055
Fluorite | Level 6

Hi All,

I took @Tom suggestion to merge all the data into one single row per participant. I ran some code that worked for me.

%macro count (Var); *create new diagnosis count variable;

proc summary data=DIAG_FULL; 
	var &var;

	by FID;

	output out=&var sum=;
run;

data &VAR;
	set &VAR;

	IF &VAR >= 1 then &VAR =1;
	ELSE &VAR =0;

	drop _TYPE_ _FREQ_;
run;

%mend count;

%count(MOOD);
%count(ANXIETY);
%count(DEVELOP);
%count(CONDUCT);
%count(ADHD);

data CLINIC_DIAG;
	merge MOOD ANXIETY DEVELOP CONDUCT ADHD DIAG_MULTI;
	by FID;
run;

View solution in original post

12 REPLIES 12
fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10
Hello!
Would you like to filter an existing dataset or would you prevent the insertion of superfluous observations?
--fja
JLang055
Fluorite | Level 6

Im trying to prevent double counting a participant in specific categories. 

fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

What about defining an index with option unique?

--fja

fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

OK, if you just want to have a "clean" intermediary table, than you could use proc sort:

data work.TestData;
	infile datalines dsd;
	input FID MOOD ANXIETY EAT DEVELOP;
datalines;	
1, ,1, , 
1, , ,1, 
1,1, , , 
1,1, , , 
2, , ,1, 
2, ,1, , 
2, , ,1, 
3, ,1, , 
3,1, , , 
;
run;

PROC SORT DATA = work.TestData NODUPKEY out=work.testdata2;
BY MOOD ANXIETY EAT;
RUN;
Tom
Super User Tom
Super User

For data like that with values that are either 1 or missing I would probably just collapse to one observation per subject.

data want;
  update have(obs=0) have;
  by fid;
run;
Obs    FID    MOOD    ANXIETY    EAT    DEVELOP

 1      1       1        1        1        .
 2      2       .        1        1        .
 3      3       1        1        .        .

 

fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10
That resulted in a more sane looking dataset, agreed. It is just that @JLang055 asked for a different kind of output.
--fja
JLang055
Fluorite | Level 6

I think this could be a good option. When I run the code you suggest I run into the following error:

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         data DIAG_FULL_F;
27         	update have(obs=0) have;
ERROR: File WORK.HAVE.DATA does not exist.
ERROR: File WORK.HAVE.DATA does not exist.
28         	by FID;
29         run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DIAG_FULL_F may be incomplete.  When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.DIAG_FULL_F was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

Do you have any idea whats going on with it?

Kurt_Bremser
Super User
ERROR: File WORK.HAVE.DATA does not exist.

What could be more clear than this? You need to create the HAVE (the name you used in your initial post) dataset before you can use it.

Patrick
Opal | Level 21

Below should create what you're asking for.

proc sort data=have out=want nodupkey;
  by _all_;
run;

Code not tested because data not provided in a form that doesn't require work to use it directly in code (i.e. a fully working SAS data step creating the data).

 

mkeintz
PROC Star

This program will keep the first instance of each unique combination of variables:

 

data have;
	infile datalines dsd;
	input FID MOOD ANXIETY EAT DEVELOP;
datalines;	
1, ,1, , 
1, , ,1, 
1,1, , , 
1,1, , , 
2, , ,1, 
2, ,1, , 
2, , ,1, 
3, ,1, , 
3,1, , , 
;
run;
data want;
  set have;
  if _n_=1 then do;
    declare hash h (dataset:'have (obs=0)');
      h.definekey(all:'Y');
      h.definedone();
  end;
  if h.add()=0 ;
run;

 

 

 

The hash object is "keyed" on all the variables in dataset have (think of it as using a compound index based on the combination of all variables).

 

The hash method ADD will be successful (i.e. h.add()=0) only when there is not already a dataitem (i.e. a "row") in it with the same combination of variables.  As a result the dataset does not even need to be sorted, even by FID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JLang055
Fluorite | Level 6

Hi All,

I took @Tom suggestion to merge all the data into one single row per participant. I ran some code that worked for me.

%macro count (Var); *create new diagnosis count variable;

proc summary data=DIAG_FULL; 
	var &var;

	by FID;

	output out=&var sum=;
run;

data &VAR;
	set &VAR;

	IF &VAR >= 1 then &VAR =1;
	ELSE &VAR =0;

	drop _TYPE_ _FREQ_;
run;

%mend count;

%count(MOOD);
%count(ANXIETY);
%count(DEVELOP);
%count(CONDUCT);
%count(ADHD);

data CLINIC_DIAG;
	merge MOOD ANXIETY DEVELOP CONDUCT ADHD DIAG_MULTI;
	by FID;
run;
fja
Lapis Lazuli | Level 10 fja
Lapis Lazuli | Level 10

Congratulations to your first solution then ... but could at least spear Toms posting a like. 😉

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1416 views
  • 7 likes
  • 6 in conversation