BookmarkSubscribeRSS Feed
VKWS
Calcite | Level 5

Hello,

 I have a wide format dataset and then each ID has 6 columns from different surveys (i.e., each variable are binary and simply indicate whether the participants responded to that survey, 0= not responded and 1=responded) . I am creating some descriptive stats to understand how many people had 1 survey missing and how many people had 2 survey missing and so on (or the other way round as how many people had responded certain number of surveys). So what I have done was first creating a string variable so that I have a pattern of the response for each participant like before.

 

Data response_finalsample_;
	SET  response_finalsample_;
	length responsex $ 6;
	responsex = cat (of s1-s6);
RUN;

The above is successful as I can see I've created a new column showing the string of the response like "110101" "000011" etc

But the problem arises as I would like to create a new variable so that it shows how many '0' or '1' in the string variable as above.

I thought I could use count function but it does not work because the new created variable did not match the actual number of responses of a participant.

 

DATA response_finalsample_;
	count_miss = countc(responsex, '1');
	output;
RUN;

It would be really helpful if someone could point the direction of doing this.
Thank you!

 

2 REPLIES 2
Patrick
Opal | Level 21

One way is to remove all the 1 (or zeros) from the string and then determine the length of the remaining string.

data test;
  infile datalines truncover;
  input var $10.;
  n0=lengthn(compress(var,'0 '));
  n1=lengthn(compress(var,'1 '));
  datalines;
00001010
01010010
010
 
000  00
;

proc print data=test;
run;

Patrick_0-1663041680703.png

 

ballardw
Super User

Binary really should be numeric as the summary functions are ever so useful.

If your S1 to S6 are numeric then

 

sum(of s1-s6) = Number of 1s

n(of s1-s6) - sum(of s1-s6) = Number of zeros

n(of s1-s6) = number of responses

 

You character version also loses information for any observation where not all of the 6 variables were populated. Quick if you see "1011" which 4 of the 6 had a response? You get the same result for S1=1 S2=0 S3=1 and S4=1 as for S1=1 S2=missing S3=0 S4=1 and S5=1,

S1=1 S2=missing S3 = missing S4=0 S5=1 and S6=1 and a few more combinations.

 

If your s1 to s6 are numeric then you likely had problems with what catx does with the conversion to character as without some additional control you would have blanks in the converted values.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 263 views
  • 1 like
  • 3 in conversation