I have a patient dataset with multiple observations and variable. I need to delete overlapping observations for each participant. See example below.
HAVE
FID | MOOD | ANXIETY | EAT | DEVELOP |
1 | 1 | |||
1 | 1 | |||
1 | 1 | |||
1 | 1 | |||
2 | 1 | |||
2 | 1 | |||
2 | 1 | |||
3 | 1 | |||
3 | 1 |
WANT
FID | MOOD | ANXIETY | EAT | DEVELOP |
1 | 1 | |||
1 | 1 | |||
1 | 1 | |||
2 | 1 | |||
2 | 1 | |||
3 | 1 | |||
3 | 1 |
Hi All,
I took @Tom suggestion to merge all the data into one single row per participant. I ran some code that worked for me.
%macro count (Var); *create new diagnosis count variable;
proc summary data=DIAG_FULL;
var &var;
by FID;
output out=&var sum=;
run;
data &VAR;
set &VAR;
IF &VAR >= 1 then &VAR =1;
ELSE &VAR =0;
drop _TYPE_ _FREQ_;
run;
%mend count;
%count(MOOD);
%count(ANXIETY);
%count(DEVELOP);
%count(CONDUCT);
%count(ADHD);
data CLINIC_DIAG;
merge MOOD ANXIETY DEVELOP CONDUCT ADHD DIAG_MULTI;
by FID;
run;
Im trying to prevent double counting a participant in specific categories.
What about defining an index with option unique?
--fja
OK, if you just want to have a "clean" intermediary table, than you could use proc sort:
data work.TestData;
infile datalines dsd;
input FID MOOD ANXIETY EAT DEVELOP;
datalines;
1, ,1, ,
1, , ,1,
1,1, , ,
1,1, , ,
2, , ,1,
2, ,1, ,
2, , ,1,
3, ,1, ,
3,1, , ,
;
run;
PROC SORT DATA = work.TestData NODUPKEY out=work.testdata2;
BY MOOD ANXIETY EAT;
RUN;
For data like that with values that are either 1 or missing I would probably just collapse to one observation per subject.
data want;
update have(obs=0) have;
by fid;
run;
Obs FID MOOD ANXIETY EAT DEVELOP 1 1 1 1 1 . 2 2 . 1 1 . 3 3 1 1 . .
I think this could be a good option. When I run the code you suggest I run into the following error:
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26 data DIAG_FULL_F;
27 update have(obs=0) have;
ERROR: File WORK.HAVE.DATA does not exist.
ERROR: File WORK.HAVE.DATA does not exist.
28 by FID;
29 run;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DIAG_FULL_F may be incomplete. When this step was stopped there were 0 observations and 0 variables.
WARNING: Data set WORK.DIAG_FULL_F was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
Do you have any idea whats going on with it?
ERROR: File WORK.HAVE.DATA does not exist.
What could be more clear than this? You need to create the HAVE (the name you used in your initial post) dataset before you can use it.
Below should create what you're asking for.
proc sort data=have out=want nodupkey;
by _all_;
run;
Code not tested because data not provided in a form that doesn't require work to use it directly in code (i.e. a fully working SAS data step creating the data).
This program will keep the first instance of each unique combination of variables:
data have;
infile datalines dsd;
input FID MOOD ANXIETY EAT DEVELOP;
datalines;
1, ,1, ,
1, , ,1,
1,1, , ,
1,1, , ,
2, , ,1,
2, ,1, ,
2, , ,1,
3, ,1, ,
3,1, , ,
;
run;
data want;
set have;
if _n_=1 then do;
declare hash h (dataset:'have (obs=0)');
h.definekey(all:'Y');
h.definedone();
end;
if h.add()=0 ;
run;
The hash object is "keyed" on all the variables in dataset have (think of it as using a compound index based on the combination of all variables).
The hash method ADD will be successful (i.e. h.add()=0) only when there is not already a dataitem (i.e. a "row") in it with the same combination of variables. As a result the dataset does not even need to be sorted, even by FID.
Hi All,
I took @Tom suggestion to merge all the data into one single row per participant. I ran some code that worked for me.
%macro count (Var); *create new diagnosis count variable;
proc summary data=DIAG_FULL;
var &var;
by FID;
output out=&var sum=;
run;
data &VAR;
set &VAR;
IF &VAR >= 1 then &VAR =1;
ELSE &VAR =0;
drop _TYPE_ _FREQ_;
run;
%mend count;
%count(MOOD);
%count(ANXIETY);
%count(DEVELOP);
%count(CONDUCT);
%count(ADHD);
data CLINIC_DIAG;
merge MOOD ANXIETY DEVELOP CONDUCT ADHD DIAG_MULTI;
by FID;
run;
Congratulations to your first solution then ... but could at least spear Toms posting a like. 😉
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.