BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
stats_auca
Obsidian | Level 7

I have been trying for hours so I decided to reach out for help.

 

Below is an example of my data. I created a flag if the Code is A by using if then do. However, I also need to flag all rows of the same VID if the code was ever A. I have tried using a data step but I cannot seem to figure out anything that works. Any help is appreciated.

 

Example Have:

   ID  VID  Code  CodeA

   1       5         A          1

   1       5         B 

   2      6         A          1

   2      6         B

   2      3         C

   3      4         B

   3      4         C

 

Example Want:

   ID  VID  Code  CodeA

   1       5         A          1

   1       5         B          1

   2      6         A          1

   2      6         B          1

   2      3         C        

   3      4         B

   3      4         C

or Example want:

   ID  VID  Code  CodeA  VID_CODEA

   1       5         A          1                   1

   1       5         B                               1

   2      6         A          1                   1

   2      6         B                               1

   2      3         C                              

   3      4         B

   3      4         C

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Double DO loop:

data want;
do until (last.vid);
  set have;
  by id vid notsorted;
  if code = "A" then codea = 1;
end;
do until (last.vid);
  set have;
  by id vid notsorted;
  output;
end;
run;

View solution in original post

15 REPLIES 15
maguiremq
SAS Super FREQ

This one is always a bit tricky. If you ever need to find this again, you would call it last observation carried forward.

 

data have;
infile datalines delimiter = "," dsd missover;
input ID $  VID  Code $ CodeA;
datalines;
1,5,A,1
1,5,B, 
2,6,A,1
2,6,B,
2,3,C,
3,4,B,
3,4,C,
;
run;

proc sort data = have;
	by id vid code;
run;

data want;
set have;
length vid_codea 3.;
retain vid_codea;
by id vid;
	if code = 'A' then codeA = 1;
	if first.vid then call missing(vid_codeA);
	if not missing(codeA) then vid_codeA = codeA;
run;
ID	VID	Code	CodeA	vid_codea
1	5	A	1	1
1	5	B	.	1
2	3	C	.	.
2	6	A	1	1
2	6	B	.	1
3	4	B	.	.
3	4	C	.	.

 

stats_auca
Obsidian | Level 7

@maguiremq Thank you so much. To be clear, column vid_codea is what I am looking for, correct? If so, it seemed to work perfectly!

 

stats_auca
Obsidian | Level 7

@maguiremq I spoke too soon. Below is the dataset and there should be a '1' for the bolded lines with the same VID when Code=1.  Any ideas what may be the issue?

ID	VID	 Code	CodeA	vid_codea
1	5	  A	      1	       1
1	5	  B	      .	       .
2	3	  C	      .        .
2	6	  A	      1        1
2	6	  B		  .        .
3	4	  B		  .        .
3	4	  C		  .        .
mkeintz
PROC Star

As long as your data are already sorted by ID, it doesn't matter where the code "A" occurs.  You can always do a MERGE of the code="A" obs with all the obs.  No further sorting is needed.

 

Code below is edited, and tested:

 

data have;
infile datalines delimiter = "," dsd missover;
input ID $  VID  Code $ CodeA;
datalines;
1,5,A,1
1,5,B, 
2,6,A,1
2,6,B,
2,3,C,
3,4,B,
3,4,C,
run;
data want;
  merge have  (where=(code='A') in=found_a_qualifier)
        have (in=allobs);
  by id;
  retain vid_codea;
  if first.id then vid_codea=found_a_qualifier; /*Changed first.have to first.id */
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
stats_auca
Obsidian | Level 7

@mkeintz thank you! for some reason when I run the code, both the vid_codea and found_a_qualifier are empty columns?

 

mkeintz
PROC Star

@stats_auca wrote:

@mkeintz thank you! for some reason when I run the code, both the vid_codea and found_a_qualifier are empty columns?

 


Yes.  I have corrected the code to use     first.id  instead of first.have.  This is the result of my failure to test the code against your sample data.  I've edited the response, with tested code.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
vidhipatel01
Calcite | Level 5
You can subset your first data:
Data Data_A;
Set Data_Have;
Where code = “A”;
Run;
Example: Data_A
ID VID Code CodeA
1 5 A 1
2 6 A 1

After that use proc sql to assign a new flag as you want:
Proc sql noprint;
Create table Data_A_want as
Select *, case when VID in (select distinct VID from Data_A) then 1 end else 0 as VID_ CODEA
From Data_Have;
Quit;

The above code should give you the flag you want.
Kurt_Bremser
Super User

Double DO loop:

data want;
do until (last.vid);
  set have;
  by id vid notsorted;
  if code = "A" then codea = 1;
end;
do until (last.vid);
  set have;
  by id vid notsorted;
  output;
end;
run;
stats_auca
Obsidian | Level 7
This code seemed to do the trick. Thank you so much!
Tom
Super User Tom
Super User

To test if a Boolean is ever TRUE just take the MAX().  This will work with your 1/. coded variable just as well as with a normal 1/0 coded Boolean variable.

data have;
  input ID $  VID  Code $ CodeA;
datalines;
1 5 A 1
1 5 B . 
2 6 A 1
2 6 B .
2 3 C .
3 4 B .
3 4 C .
;

proc sql ;
create table want as
  select *
       , max(codeA) as any_codeA
  from have
  group by id
;
quit;

proc print;
run;

Result

                            Code     any_
Obs    ID    VID    Code      A     codeA

 1     1      5      A        1       1
 2     1      5      B        .       1
 3     2      6      A        1       1
 4     2      3      C        .       1
 5     2      6      B        .       1
 6     3      4      C        .       .
 7     3      4      B        .       .

But to do other aggregations like are ALL of the observations TRUE you need to have real Boolean flags where the FALSE results are coded as zero instead of missing.

data have;
  input ID $  VID  Code $ ;
datalines;
1 5 A
1 5 B
2 6 A
2 6 B
2 3 C
3 4 B
3 4 C
4 7 A
4 7 A
;

proc sql ;
create table want as
  select *
       , (code='A') as codeA
       , max(calculated codeA) as any_codeA
       , min(calculated codeA) as all_codeA
  from have
  group by id
;
quit;
                            code     any_     all_
Obs    ID    VID    Code      A     codeA    codeA

 1     1      5      B        0       1        0
 2     1      5      A        1       1        0
 3     2      6      A        1       1        0
 4     2      3      C        0       1        0
 5     2      6      B        0       1        0
 6     3      4      C        0       0        0
 7     3      4      B        0       0        0
 8     4      7      A        1       1        1
 9     4      7      A        1       1        1

 

 

SaSoDude
Obsidian | Level 7

If I may drop in?

 

 input ID $  VID  Code $ ;

Is there really a sense for the "$" after ID?

Tom
Super User Tom
Super User

The $ means read it as character.  Since I was lazy and did not DEFINE the variables before they appeared in the INPUT statement I need to have that or else it will make ID as a NUMERIC variable. 

 

You should not store identifiers as numbers as there is no reason to do arithmetic with them.  What would the MEAN value of ID mean?

mkeintz
PROC Star

@Tom wrote:

You should not store identifiers as numbers as there is no reason to do arithmetic with them.  What would the MEAN value of ID mean?

 

Agree.

 

But there may be value in knowing the MINIMUM and MAXIMUM ID values (or maybe the range), if ID's were historically assigned in numeric sequence.  That's not enough in my mind to justify numeric ID's, when character vars could also be sequenced.  But I suspect a lot of ID systems have started out with such numeric sequences.  I've certainly seen a lot of surveys do exactly that.

 

 


 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SaSoDude
Obsidian | Level 7

Thank you @Tom  - I can follow your explanation, but still have a different view on declaring IDs.

In this case the values of this ID take only numeric values and for this reason alone I see no advantage to declare the ID as a character.

I also know from other BI tools that especially with IDs numbers can be processed much faster than characters. Whether this is also the case in SAS, I do not know - I am still too new.

 

Best regards!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2577 views
  • 2 likes
  • 7 in conversation