I have the following sample data below:
data test_a;
infile datalines dlm= ',';
input id reg_date:date9. startdate:date9. enddate:date9. extdate:date9.;
format reg_date startdate enddate extdate date9.;
datalines;
005,01JUL2017,31JUl2016,3JUL2017,31AUG2018
007,25NOV2017,31JUL2016,31JUL2017,31OCT2018
001,12MAR2018,31JAN2016,31JAN2017,30APR2018
002,05JUN2018,29FEB2016,28FEB2017,31MAY2018
;
run;
and I create flag variables using the following code to get the dataset below.
Everything works fine but I have alot of data and I would like to know if there is a
code that can be run to ensure that only one category is being flagged per ID .
I know that if/then should only flag in one category (like it does below) but I would
like to do an additional check.
data test_b;
set test_a;
if startdate le reg_date le enddate then withinyr = 1;
else if enddate le reg_date le extdate then withingrace = 1;
else outsidetimeframe = 1;
run;
Obs id reg_date startdate enddate extdate withinyr withingrace outsidetimeframe
1 5 01JUL2017 31JUL2016 03JUL2017 31AUG2018 1 . .
2 7 25NOV2017 31JUL2016 31JUL2017 31OCT2018 . 1 .
3 1 12MAR2018 31JAN2016 31JAN2017 30APR2018 . 1 .
4 2 05JUN2018 29FEB2016 28FEB2017 31MAY2018 . . 1
You could have another flag like
if sum(withinyr,withingrace, outsidetimeframe)>1 then flag='Alert';
That is assuming i understand what you mean
data test_b;
set test_a;
if startdate le reg_date le enddate then withinyr = 1;
else if enddate le reg_date le extdate then withingrace = 1;
else outsidetimeframe = 1;
if sum(withinyr,withingrace, outsidetimeframe)>1 then flag='Alert';
run;
I just wanna make sure that if a ID is counted in withinyr then it is not also counted as withingrace.
I just want to make sure that a id is not counted in more than one category.
@novinosrin wrote:
You could have another flag like
if sum(withinyr,withingrace, outsidetimeframe)>1 then flag='Alert';
That is assuming i understand what you mean
data test_b; set test_a; if startdate le reg_date le enddate then withinyr = 1; else if enddate le reg_date le extdate then withingrace = 1; else outsidetimeframe = 1; if sum(withinyr,withingrace, outsidetimeframe)>1 then flag='Alert'; run;
Since @luvscandy27 shouldn't even assign values to more than one variable then it may be worth considering if more than one value was assigned to any of the group of variables.
if N(withinyr,withingrace, outsidetimeframe)>1 then flag='Alert';
or even not exactly = 1 meaning no flag was set (if you expect exactly one to be set for every record).
if N(withinyr,withingrace, outsidetimeframe) ne 1 then flag='Alert';
Agreed. I got confused.
@novinosrin wrote:
Agreed. I got confused.
Nothing wrong with your suggestion, just opening some thought lines about what might actually be needed. I'm not sure that OP is really sure yet.
I may not be explaining this correctly I apologize. Looking at the example below I want to make sure
that for a id it only flags one category. If I have id 5(pictured below) and it flags in more
than one category then i would like to create a check that would alert me to that.
obs id reg_date startdate enddate extdate withinyr withingrace outsidetimeframe
1 5 7/1/2017 7/31/2016 7/3/2017 8/31/2018 1 1 .
HI @luvscandy27 Much better. Can you explain how you want to design the Flag , or in other words how you want it. Some will have PUT texts writing to LOG. Some will have flag variables with variable names captured, Some will have an audit data set and many others.
Alternatively wisdom comes from experience, @ballardw might suggest something regarding the design
@luvscandy27 wrote:
I may not be explaining this correctly I apologize. Looking at the example below I want to make sure
that for a id it only flags one category. If I have id 5(pictured below) and it flags in more
than one category then i would like to create a check that would alert me to that.
obs id reg_date startdate enddate extdate withinyr withingrace outsidetimeframe
1 5 7/1/2017 7/31/2016 7/3/2017 8/31/2018 1 1 .
Any of the suggested SUM(<of your flag variables>) >1 or N (<of your flag variables>) discussed above will add a variable you could use to find any of those. Add the example line to the data step code that creates the flag variables AFTER they should be assigned.
To find those you can use another data step or report proc with a WHERE clause or statement:
For example.
proc print data=have;
where flag=<what ever value was set>;
run;
Or if you read your log (you do don't you) then you could use:
If N (withinyr, withingrace, outsidetimeframe) > 1 then put "WARNING: Multiple flag variables set for " id= +1 reg_date=;
OR your need to be much more explicit about what you mean by "create a check".
With your code, you can only get one flag per record. Unless something is seriously wrong with basic logic, or with the basic SAS datastep compiler, or with your computer.
But if your data contains multiple records for the same ID, and you want to find the IDs where different records have different flags, that's another story.
One solution could be this:
proc sql;
select * from test_b
group by ID
having sum(max(withinyr),max(withingrace),max(outsidetimeframe))>1;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.