Hi there,
I have data that has pre and post indo. I need a new data that should not include conditions that occurred in pre. For example, ID#1 has Anemia and Diarrhea in pre and post both. The new data needs to delete these observations.
Data have:
ID | DateX | Conditions | Type |
1 | 12/25/2019 | Anemia | Post |
1 | 11/5/2018 | Colitis | Post |
1 | 11/5/2018 | Diarrhea | Post |
1 | 11/24/2019 | Kidney failure/disease | Post |
1 | 12/12/2018 | Neuro | Post |
1 | 5/20/2017 | Anemia | Pre |
1 | 5/20/2017 | Diarrhea | Pre |
1 | 9/14/2017 | Myositis | Pre |
1 | 4/24/2017 | Polyneuropathy | Pre |
2 | 8/19/2020 | Adrenal/Adrenocortical | Post |
2 | 3/23/2020 | Colitis | Post |
2 | 10/7/2019 | Neutropenia | Post |
3 | 7/9/2017 | Dermatitis/Rash | Post |
3 | 4/22/2018 | Liver failure/disease | Post |
3 | 4/5/2015 | Myositis | Post |
3 | 6/8/2015 | Neuro | Post |
Data Want:
ID | DateX | Conditions | Type |
1 | 11/5/2018 | Colitis | Post |
1 | 11/24/2019 | Kidney failure | Post |
1 | 12/12/2018 | Neuro | Post |
1 | 9/14/2017 | Myositis | Pre |
1 | 4/24/2017 | Polyneuropathy | Pre |
2 | 8/19/2020 | Adrenal | Post |
2 | 3/23/2020 | Colitis | Post |
2 | 10/7/2019 | Neutropenia | Post |
3 | 7/9/2017 | Dermatitis/Rash | Post |
3 | 4/22/2018 | Liver failure/disease | Post |
3 | 4/5/2015 | Myositis | Post |
3 | 6/8/2015 | Neuro | Post |
It seems to me that you did not delete the required observations from WANT.
Check next code and adapt to the right requirements:
data have;
infile cards dlm='09'x truncover;
input ID DateX :mmddyy10. Conditions $ Type $;
format datex date9.;
cards;
1 12/25/2019 Anemia Post
1 11/5/2018 Colitis Post
1 11/5/2018 Diarrhea Post
1 11/24/2019 Kidney failure/disease Post
1 12/12/2018 Neuro Post
1 5/20/2017 Anemia Pre
1 5/20/2017 Diarrhea Pre
1 9/14/2017 Myositis Pre
1 4/24/2017 Polyneuropathy Pre
2 8/19/2020 Adrenal/Adrenocortical Post
2 3/23/2020 Colitis Post
2 10/7/2019 Neutropenia Post
3 7/9/2017 Dermatitis/Rash Post
3 4/22/2018 Liver failure/disease Post
3 4/5/2015 Myositis Post
3 6/8/2015 Neuro Post
; run;
proc sort data=have;
by ID conditions type;
run;
data want;
merge have(where=(type='Pre') in=in1)
have(where=(type='Post') in=in2);
by ID conditions type;
if not (in1 and in2);
run;
@newsas007 wrote:
Hi there,
I have data that has pre and post indo. I need a new data that should not include conditions that occurred in pre. For example, ID#1 has Anemia and Diarrhea in pre and post both. The new data needs to delete these observations.
Data have:
ID DateX Conditions Type 1 12/25/2019 Anemia Post 1 11/5/2018 Colitis Post 1 11/5/2018 Diarrhea Post 1 11/24/2019 Kidney failure/disease Post 1 12/12/2018 Neuro Post 1 5/20/2017 Anemia Pre 1 5/20/2017 Diarrhea Pre 1 9/14/2017 Myositis Pre 1 4/24/2017 Polyneuropathy Pre 2 8/19/2020 Adrenal/Adrenocortical Post 2 3/23/2020 Colitis Post 2 10/7/2019 Neutropenia Post 3 7/9/2017 Dermatitis/Rash Post 3 4/22/2018 Liver failure/disease Post 3 4/5/2015 Myositis Post 3 6/8/2015 Neuro Post
Data Want:
ID DateX Conditions Type 1 12/25/2019 Anemia Post 1 11/5/2018 Colitis Post 1 11/5/2018 Diarrhea Post 1 11/24/2019 Kidney failure/disease Post 1 12/12/2018 Neuro Post 1 5/20/2017 Anemia Pre 1 5/20/2017 Diarrhea Pre 1 9/14/2017 Myositis Pre 1 4/24/2017 Polyneuropathy Pre 2 8/19/2020 Adrenal/Adrenocortical Post 2 3/23/2020 Colitis Post 2 10/7/2019 Neutropenia Post 3 7/9/2017 Dermatitis/Rash Post 3 4/22/2018 Liver failure/disease Post 3 4/5/2015 Myositis Post 3 6/8/2015 Neuro Post
Please double check your "want". It looks exactly like Have to me.
Provide your data as data step code pasted into a text box using the </> icon on the forum such as
data have; infile datalines dlm=','; input ID $ DateX :mmddyy10. Conditions :$25. Type :$4.; format datex mmddyy10.; datalines; 1,12/25/2019,Anemia,Post 1,11/5/2018,Colitis,Post 1,11/5/2018,Diarrhea,Post 1,11/24/2019,Kidney failure/disease,Post 1,12/12/2018,Neuro,Post 1,5/20/2017,Anemia,Pre 1,5/20/2017,Diarrhea,Pre 1,9/14/2017,Myositis,Pre 1,4/24/2017,Polyneuropathy,Pre 2,8/19/2020,Adrenal/Adrenocortical,Post 2,3/23/2020,Colitis,Post 2,10/7/2019,Neutropenia,Post 3,7/9/2017,Dermatitis/Rash,Post 3,4/22/2018,Liver failure/disease,Post 3,4/5/2015,Myositis,Post 3,6/8/2015,Neuro,Post ;
I can't tell if you want just the "pre" , just the "post" or both removed.
The following code removes both.
proc sort data=have; by id conditions datex; run; data want; set have; by id conditions; if type='Pre' and first.conditions and not (last.conditions) then delete; if type='Post' and not(first.conditions) and last.conditions then delete; run;
You can remove one of the "if" that you don't need.
This works by getting like values of Conditions in chronological order for each ID. If your Datex is not a numeric date value then you need to make a date value, preferably with an appropriate format for legibility.
The BY statement in the Want data step code creates automatic true/false variables for each of the variables on the By statement. So each record can tell whether the Condition is the first time that appears for each id. If a value is the only one then both First and Last are true.
Warning: If you have 3 values of the same condition for an ID this is likely to not do what you want.
Also if the "Pre" and "Post" are incorrect for chronological order then this will fail.
Both of the above would require a more complete example data, example results and likely additional actual description of rules involved.
You apparently want conditions that appear exclusively in pre, or exclusively in post:
data have;
infile datalines dlm=',';
input ID $ DateX :mmddyy10. Conditions :$25. Type :$4.;
format datex mmddyy10.;
datalines;
1,12/25/2019,Anemia,Post
1,11/5/2018,Colitis,Post
1,11/5/2018,Diarrhea,Post
1,11/24/2019,Kidney failure/disease,Post
1,12/12/2018,Neuro,Post
1,5/20/2017,Anemia,Pre
1,5/20/2017,Diarrhea,Pre
1,9/14/2017,Myositis,Pre
1,4/24/2017,Polyneuropathy,Pre
2,8/19/2020,Adrenal/Adrenocortical,Post
2,3/23/2020,Colitis,Post
2,10/7/2019,Neutropenia,Post
3,7/9/2017,Dermatitis/Rash,Post
3,4/22/2018,Liver failure/disease,Post
3,4/5/2015,Myositis,Post
3,6/8/2015,Neuro,Post
run;
data want;
set have;
if _n_=1 then do;
declare hash pre_conditions (dataset:'have (where=(type="Pre")');
pre_conditions.definekey('id','conditions');
pre_conditions.definedone();
declare hash post_conditions (dataset:'have (where=(type="Post")');
post_conditions.definekey('id','conditions');
post_conditions.definedone();
end;
if (type='Pre' and post_conditions.check()^=0) or
(type='Post' and pre_conditions.check()^=0);
run;
This creates a hash object (think lookup table) for all the pre's, and a separate hash object for all the post's. Then for each incoming pre record, keep it only if that id/condition doesn't appear in the post hash. And vice versa.
or, slightly more efficient, because it constructs one large hash object instead of two smaller ones:
data want;
set have;
if _n_=1 then do;
declare hash all_conditions (dataset:'have');
all_conditions.definekey('id','conditions','type');
all_conditions.definedone();
end;
if (type='Pre' and all_conditions.check(key:id,key:conditions,key:'Post')^=0) or
(type='Post' and all_conditions.check(key:id,key:conditions,key:'Pre')^=0) ;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.