BookmarkSubscribeRSS Feed
sks521
Quartz | Level 8

Hi,

 

I have a data set in the following form;

 

ID    Practice           Age             Status            Sex

1     Bingley             77               Eligible           Female

2     Saltaire             74               Intervention    Male

3     Saltaire             73               Eligible           Male

4     Saltaire             75               Intervention    Female

5     Bingley              72              Eligible           female

 

Saltaire practice has both 'intervention' and 'eligible' group while 'Bingley' has only 'eligible' group. My cases are 'intervention' group under 'status' variable within 'Saltaire' practice. I want to create two different groups of controls; one that has 'eligible' status within 'Saltaire' practice, and second with 'eligible' status within 'Bingley' practice, matching on age and sex and creating 1:1 match. there are other variable in the data set which I am not using for matching.

 

I will appreciate help with the syntax please?

 

Ta

7 REPLIES 7
sks521
Quartz | Level 8

And when I run this syntax to create separate case and control data sets;

 

data study cont1 cont2; *controls2;
set demo;
if practice= "Saltaire medical practice" and status = "interventionGroup" then output STUDY;
else if practice = "Saltaire medical practice" and status = "Eligible" then output cont1;
else output cont2;
RUN;

 

I get this log;

 

NOTE: There were 409 observations read from the data set WORK.DEMO.
NOTE: The data set WORK.STUDY has 0 observations and 16 variables.
NOTE: The data set WORK.CONT1 has 0 observations and 16 variables.
NOTE: The data set WORK.CONT2 has 409 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

 

 

while I expect 106 observations in study, 119 in cont1 and 180 in cont2.

 

Can someone cast their beady eye and tell me where I am wrong please.

 

Thanks

S

koyelghosh
Lapis Lazuli | Level 10
As per the first screenshot the status is "Intervention" and not "interventionGroup" and I am guessing that in the data set it might actually be Eligible with some padded spaces and not "Eligible" only ... however this is only a guess + the problem here is that in such a case only last ELSE should have executed and you should have seen 409 rows but if I did the math right 106+119+180=405 only... 4 rows discrepancy.
Will you be able to post the full log here?
Thanks you.
sks521
Quartz | Level 8

Yes, absolutely right! its 'intervention' but even correcting that doesn't correct the problem. posted the log in the previous post, posting again;

 


88 data study cont1 cont2; *controls2;
89 set demo;
90 if practice= "Saltaire medical practice" and status = "intervention" then output STUDY;
91 else if practice = "Saltaire medical practice" and status = "Eligible" then output cont1;
92 else output cont2;
93 RUN;

NOTE: There were 409 observations read from the data set WORK.DEMO.
NOTE: The data set WORK.STUDY has 0 observations and 16 variables.
NOTE: The data set WORK.CONT1 has 0 observations and 16 variables.
NOTE: The data set WORK.CONT2 has 409 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

 

 

Thanks

S

 

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @sks521 

 

Same problem with the variable practice as with status. If it's not "Saltaire medical practice" then output goes to cont2. But after your first post ithe value is "Saltaire".

sks521
Quartz | Level 8

Sorry I should've been more precise; the actual values in the data set are 'Saltaire medical practice' and 'Bingley medical practice' and for variable 'status' are 'Eligible' and 'InterventioGroup', I just made them up in my first post to make it look short.

 

But I am assuming something else is wrong with my syntax which I can' figure out.

 

Thanks

S

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @sks521 

 

I can't see any problems with your code. There must be something wrong with your input data, because your code works as expected, if your input data structure and content is as you describe it.

 

* Demo data;
data demo (drop=rec);
	length ID 8 Practice $30 Age 8 Status $20 Sex $6;
	input rec $char80.;
	id = input(scan(rec,1,','),8.);
	Practice = scan(rec,2,',');
	Age = input(scan(rec,3,','),8.);
	Status = scan(rec,4,',');
	Sex = scan(rec,5,',');
datalines;
1,Bingley medical practice,77,Eligible,Female
2,Saltaire medical practice,74,interventionGroup,Male
3,Saltaire medical practice,73,Eligible,Male
4,Saltaire medical practice,75,interventionGroup,Female
5,Bingley medical practice,72,Eligible,female
;
run;

* Your code;
data study cont1 cont2;
set demo;
if practice= "Saltaire medical practice" and status = "interventionGroup" then output STUDY;
else if practice = "Saltaire medical practice" and status = "Eligible" then output cont1;
else output cont2; run;

160 * Your code;
161 data study cont1 cont2;
162 set demo;
163 if practice= "Saltaire medical practice" and status = "interventionGroup" then output STUDY;
164 else if practice = "Saltaire medical practice" and status = "Eligible" then output cont1;
165 else output cont2;
166 run;

NOTE: There were 5 observations read from the data set WORK.DEMO.
NOTE: The data set WORK.STUDY has 2 observations and 5 variables.
NOTE: The data set WORK.CONT1 has 1 observations and 5 variables.
NOTE: The data set WORK.CONT2 has 2 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.00 seconds

 

sks521
Quartz | Level 8

Here is the syntax;

 

data study cont1 cont2; *controls2;
set demo;
if practice= "Saltaire medical practice" and status = "interventionGroup" then output STUDY;
else if practice = "Saltaire medical practice" and status = "Eligible" then output cont1;
else output cont2;
RUN;

 

and log;

75 data study cont1 cont2; *controls2;
76 set demo;
77 if practice='Saltaire medical practice' and status= 'interventionGroup' then output STUDY;
78 else if practice='Saltaire medical practice' and status= 'Eligible' then output cont1;
79 else output cont2;
80 RUN;

NOTE: There were 409 observations read from the data set WORK.DEMO.
NOTE: The data set WORK.STUDY has 0 observations and 16 variables.
NOTE: The data set WORK.CONT1 has 0 observations and 16 variables.
NOTE: The data set WORK.CONT2 has 409 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1603 views
  • 0 likes
  • 3 in conversation