BookmarkSubscribeRSS Feed
NewUsrStat
Lapis Lazuli | Level 10

Dear all, 

 

I have a dataset with rules I have to apply. 

For example, suppose to have the following dummy dataset: 

 

data DB;
  input ID :$20. Line $8. Other_trt :$20.;
cards;
0001 2nd line  Y
0001 3rd line   Y
0001 1st line   Y
0001 2nd line  N
0002 1st line   Y
0003 3rd line  Y
0003 1st line  N
0004 2nd line Y
;
run;

I have to do the following: flag IDs not in (1st line) that have at least one "Other_trt" entry = "Y". 

 

I'm used to do: "proc sort" and then "first.*" Since I'm in a data step with many other rules on the main dataset, I would like to not to end the data step with "run" to do proc sort and the other stuffs  but instead do the job in the same data step. 

 

Desired output: 

 

data DB;
  input ID :$20. Line $8. Other_trt :$20.  Flag  $8.;
cards;
0001 2nd line  Y 1
0001 3rd line   Y 0
0001 1st line   Y  0
0001 2nd line  N  0
0002 1st line   Y  0
0003 3rd line  Y  1
0003 1st line  N  0
0004 2nd line Y  1
;
run;

Rule: "AT LEAST ONE ID with Other_trt = "Y" with (having) Line not in 1st line"

 

How can I deal with it?

 

Thank you in advance

6 REPLIES 6
ballardw
Super User

Since it sounds like you need to mark multiple observations (not specified how) it would likely take two or possibly more passes depending on other rules using a data step.

 

I can come up with at least two different meanings for " flag IDs not in (1st line) that have at least one "Other_trt" entry = "Y". " It would be a good idea to show the desired result to make sure any interpretation we use would be correct for the problem as well as showing the value(s) of any flags set.

 

Note that there quite a few places where using a numeric 1/0 instead of text Y/N or True/False are much easier to deal with. For example, SAS treats numeric 1 as True or Yes and 0 as False or No so you can SUM a number of values and get the count of Yes values. And the result of a comparison is returned as 1/0  so you don't have to add in logic to assign Y/N. A format can display the text value of "Y" or "N" or other text as desired for reporting for people to read.

 

If this were my problem I would likely use a separate step with either Proc SQL or possibly Proc Freq to get the desired counts of Other_trt values and merge that result with the DB data set by ID values to add in that Flag.

NewUsrStat
Lapis Lazuli | Level 10
Thank you very much for your reply. I edited the question with the desired output.
PaigeMiller
Diamond | Level 26

The advice from @ballardw to represent Y and N with numeric values 1 and 0 should not be ignored. This applies to all SAS programming that you do now and in the future. If you are creating output (like a report or table) that you need to show others and Y and N is required in the output, then use a custom format to display 1 as Y and 0 as N.

 

However, I am still unable to understand the stated logic that takes me from the input data to the desired data. It simply is not clear. 

 

Please try to write it again, in a new post in this thread (instead of correcting the original post) explaining the rules (in detail) you want to follow. As @ballardw would say, we need RULES and not EXAMPLES (although sometimes we need both).

--
Paige Miller
Kurt_Bremser
Super User

You write that you want to flag whole ID's, but then you have the flags set only for particular observations within an ID.

Please describe in detail for every observation in your result dataset why the flag is set as it is.

Post this in a new answer, do not modify your original post! This is needed so it's easier to follow the flow of discussion.

Tom
Super User Tom
Super User

If you want to flag ID's then why is the value of FLAG different on some of the observations for ID 0001 and 0003?

 

If not then why isn't FLAG=1 on the second observation?

I has OTHER_TRT='Y' and it is not '1st line'.

 

In general if you want to generate a new variable for all of a BY group you have two choices.

  • Have all of the information needed for calculation available on the FIRST observation in the group.  Then RETAIN the NEW variable so its value stays the same on all observations in the BY group.
  • Use two do loops to process the observations for each BY group twice. In the first loop gather the information needed to make the calculation.  And in the second loop re-read and output the observations so all of the observations in the BY group get the newly calculated value.
data want;
  do until (last.id);
    set have;
    by id;
    * gather information ;
  end;
  * calculations at the BY group level ;
  do until (last.id);
    set have;
    by id;
    * calculations at the observation level ;
    output;
  end;
run;

 

Ksharp
Super User

As Tom said, the second obs in your desired output should be flag=1.

And using DOW skill suggested by Tom.

data DB;
  input ID :$20. Line $8. Other_trt :$20.;
cards;
0001 2nd line  Y
0001 3rd line   Y
0001 1st line   Y
0001 2nd line  N
0002 1st line   Y
0003 3rd line  Y
0003 1st line  N
0004 2nd line   Y
;
run;

data want;
 count=0;
 do until(last.id);
  set DB;
  by id;
  if Line ne '1st line' and Other_trt='Y' then count+1;
 end;
 do until(last.id);
  set DB;
  by id;
  flag=0;
  if count>0 and Line ne '1st line' and Other_trt='Y' then flag=1;
  output;
 end;
drop count;
run;

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 520 views
  • 1 like
  • 6 in conversation