BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
OlsabeckT29
Fluorite | Level 6

Hello,

I have a list of pneumococcal vaccination administrations (n=2,400,000) and am trying to determine the best way to indicate if each person has met the criteria to be considered complete. The dataset is vertical. I have unique identifiers for each person and if that person has received more than one vaccination then they will have multiple observations listed. To be complete the person will need to meet one of these criteria:

  • PCV15 + PPSV23
  • PCV20
  • PPSV23 + PCV15
  • PPSV23 + PCV20
  • PCV13 + PPSV23
  • PCV13 + PCV20

What would be the most efficient way to handle marking individuals as complete?

Below is an example using dummy data of what I have vs what I want.

HAVE:

IDVaccine
1PCV15
1PCV13
1PPSV23
2PCV13
2PPSV23
3PCV20
4PPSV23
4PCV15
5PCV13

 

WANT (where 1 = complete and 0 = incomplete)

IDPneumo Complete
11
21
31
41
50

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @OlsabeckT29,

 

Your vertical "HAVE" dataset can also be used directly (i.e., without transposing). If it is sorted or at least grouped by ID (in the latter case use by id notsorted;), binary flags for the four relevant vaccines could be set in a DOW loop and the resulting bit pattern be evaluated after the last observation for the current ID has been processed:

 

data want(keep=id complete);
b=0;
do until(last.id);
  set have;
  by id;
  select(vaccine);
    when('PCV20')  b=bor(b,8);
    when('PPSV23') b=bor(b,4);
    when('PCV13')  b=bor(b,2);
    when('PCV15')  b=bor(b,1);
    otherwise;
  end;
end;
complete=b>4;
run;

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

One way:

Note use of data step to provide example data. You really should provide such as otherwise we guess as to variable types and may not result in code that works with your data.

The transpose places all vaccinations for id on a single record. This requires sorting the data beforehand.

With data in this form any of the combinations involving PCV20 really only use your rule of a single PCV20 is required.

 

data have;
input ID $ Vaccine $;
datalines;
1	PCV15
1	PCV13
1	PPSV23
2	PCV13
2	PPSV23
3	PCV20
4	PPSV23
4	PCV15
5	PCV13
;
proc sort data=have;
   by id ;
run;

proc transpose data=have out=trans
  prefix=vax;
  by id;
  var vaccine;
run;

data want;
   set trans;
   array v (*) vax: ;
   vaxall= catx(' ',of v(*));
   if index(vaxall,'PCV20')>0 then complete=1;
   else if index(vaxall,'PCV15')>0 and index(vaxall,'PPSV23')>0 then complete=1;
   else if index(vaxall,'PCV13')>0 and index(vaxall,'PPSV23')>0 then complete=1;
   else complete=0;
run;

The array is just to create a shorter code for creating a space delimited variable with all of the vaccinations in a single variable. It does not assume how many vaccinations are available. The IF block searches for combinations as described in your question. Index function returns a character position so the >0 is used to find if it is present at all. Note that the function is case sensitive. If any of your actual values are may have different character case you should use

   vaxall= upcase (catx(' ',of v(*)) );

to make all of the letters upper case for the comparison.

If you have not used arrays much the " of v(*) " is a way that many functions will use all of the values in the array.

 

After verifying the logic works you could drop the Vax1 - VaxN and Vaxall variables.

 


@OlsabeckT29 wrote:

Hello,

I have a list of pneumococcal vaccination administrations (n=2,400,000) and am trying to determine the best way to indicate if each person has met the criteria to be considered complete. The dataset is vertical. I have unique identifiers for each person and if that person has received more than one vaccination then they will have multiple observations listed. To be complete the person will need to meet one of these criteria:

  • PCV15 + PPSV23
  • PCV20
  • PPSV23 + PCV15
  • PPSV23 + PCV20
  • PCV13 + PPSV23
  • PCV13 + PCV20

What would be the most efficient way to handle marking individuals as complete?

Below is an example using dummy data of what I have vs what I want.

HAVE:

ID Vaccine
1 PCV15
1 PCV13
1 PPSV23
2 PCV13
2 PPSV23
3 PCV20
4 PPSV23
4 PCV15
5 PCV13

 

WANT (where 1 = complete and 0 = incomplete)

ID Pneumo Complete
1 1
2 1
3 1
4 1
5 0

 

Thank you


 

PaigeMiller
Diamond | Level 26

Slightly simpler:

 

data want;
   set trans;
   if whichc('PCV20', of vax:) then complete=1;
   else if whichc('PCV15',of vax:) and whichc('PPSV23',of vax:) then complete=1;
   else if whichc('PCV13',of vax:) and whichc('PPSV23',of vax:) then complete=1;
   else complete=0;
run;

 

I note that if there is also a code similar to the ones you list but with extra characters on the end, such as PCV20A, and you want complete to be zero in this case, then the solution from @ballardw will fail without additional programming. 

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hello @OlsabeckT29,

 

Your vertical "HAVE" dataset can also be used directly (i.e., without transposing). If it is sorted or at least grouped by ID (in the latter case use by id notsorted;), binary flags for the four relevant vaccines could be set in a DOW loop and the resulting bit pattern be evaluated after the last observation for the current ID has been processed:

 

data want(keep=id complete);
b=0;
do until(last.id);
  set have;
  by id;
  select(vaccine);
    when('PCV20')  b=bor(b,8);
    when('PPSV23') b=bor(b,4);
    when('PCV13')  b=bor(b,2);
    when('PCV15')  b=bor(b,1);
    otherwise;
  end;
end;
complete=b>4;
run;

 

 

Ksharp
Super User
/*
Very interesting question
*/
data have;
input ID $ Vaccine $;
datalines;
1 PCV15
1 PCV13
1 PPSV23
2 PCV13
2 PPSV23
3 PCV20
4 PPSV23
4 PCV15
5 PCV13
;
data have;
 set have;
 dummy=1;
run;
proc glmselect data=have noprint outdesign(addinputvars)=temp;
class Vaccine;
model dummy= Vaccine/selection=none noint;
run;
proc summary data=temp nway;
class id;
var Vaccine_:;
output out=want(drop=_:) max=;
run;
data want;
 set want;
Completed=
(Vaccine_PCV15  and Vaccine_PPSV23) or 
 Vaccine_PCV20                      or 
(Vaccine_PPSV23 and Vaccine_PCV15)  or 
(Vaccine_PPSV23 and Vaccine_PCV20)  or 
(Vaccine_PCV13  and Vaccine_PPSV23) or 
(Vaccine_PCV13  and Vaccine_PCV20) 
;
 run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1378 views
  • 3 likes
  • 5 in conversation