BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kmardinian
Quartz | Level 8

Hi, I am writing a program where the dataset I am using contains multiple variables with check marks (X) in them.

 

ID  RAPID  WASH  POOR   OTHER    SUM

1      X                      X                          2

2      X                                                  1    

3                  X          X                          3  

 

I would like to write a code that whenever SUM equals 2 or 3, another variable is created that lists the variable names that have been check marked.

 

So far I came up with this, but cannot find a way to make it so that the new variable includes RAPID and POOR for ID #1

 

data K;

set V (keep=ID SUM RAPID WASH POOR OTHER);

length VQ $1000;

if VQ=2 then do;

if RAPID ne '  ' then VQ="Rapid cleaning";

if WASH ne '  ' then VQ="Quick Wash";

if POOR ne '  ' then VQ='Poor cleaning';

if OTHER ne '  ' then VQ="Other';

output; end;

if VQ=3 then do;

if RAPID ne '  ' then VQ="Rapid cleaning";

if WASH ne '  ' then VQ="Quick Wash";

if POOR ne '  ' then VQ='Poor cleaning';

if OTHER ne '  ' then VQ="Other';

output; end;

drop RAPID WASH POOR OTHER;

Run;

 

Any help is greatly appreciated! Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
jdwaterman91
Obsidian | Level 7

This is how I would produce the output you are looking for here:

 

Data K;
Set V (Keep = ID Sum Rapid Wash Poor Other);
Length VQ $25;

If Sum > 1 Then Do;

If Rapid = 'X' Then VQ = 'Rapid';
If Wash = 'X' Then VQ = Catx(', ', VQ, 'Wash');
If Poor = 'X' Then VQ = Catx(', ', VQ, 'Poor');
If Other = 'X' Then VQ = Catx(', ', VQ, 'Other');

End;

Keep ID Sum VQ;

Run;

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

@kmardinian wrote:

Hi, I am writing a program where the dataset I am using contains multiple variables with check marks (X) in them.

 

ID  RAPID  WASH  POOR   OTHER    SUM

1      X                      X                          2

2      X                                                  1    

3                  X          X                          3  

 


Just so I'm 100% clear on what you mean, modify the table and type in the desired output

--
Paige Miller
kmardinian
Quartz | Level 8

I would like the output to look like the following

 

 

ID               SUM                VQ

1                   2                 Rapid, Poor

2                   1      

3                   3                 Wash, Poor

 

 

Thank you so much for your help!

PaigeMiller
Diamond | Level 26

Just so we don't go down a wrong path, what are you planning to do with variable VQ after it is created? Is the goal simply to put this table into some kind of report, or are there additional analyses planned?

--
Paige Miller
kmardinian
Quartz | Level 8

VQ would be used to create a report using Proc report later on in my code

jdwaterman91
Obsidian | Level 7

This is how I would produce the output you are looking for here:

 

Data K;
Set V (Keep = ID Sum Rapid Wash Poor Other);
Length VQ $25;

If Sum > 1 Then Do;

If Rapid = 'X' Then VQ = 'Rapid';
If Wash = 'X' Then VQ = Catx(', ', VQ, 'Wash');
If Poor = 'X' Then VQ = Catx(', ', VQ, 'Poor');
If Other = 'X' Then VQ = Catx(', ', VQ, 'Other');

End;

Keep ID Sum VQ;

Run;
kmardinian
Quartz | Level 8

Thank you so much, that worked perfectly!

ballardw
Super User

An extremely common approach to data like this is to actually read the data so that instead of X or Y or check mark or some other character to read it such that the character of interest in your data is read as a numeric where 1= X, yes, true, selected or whatever the mark means and 0 means not selected, false, no and so on.

Then your question becomes nearly trivial:

 

sum = sum(rapid, poor, wash,other);

 

An example with a custom informat to read a character into numeric and some display formats.

proc format library=work;
invalue check
'X'=1
other=0
;
value rapid
1='Rapid cleaning'
other=' ';
value wash
1='Quick wash'
other=' ';
value poor
1='Poor cleaning'
other=' ';
value other
1='Other'
other=' ';
run;

data example;
  infile datalines dlm=',' dsd;
  informat id $3. rapid wash poor other check.;
  input id rapid wash poor other;
  length vq $100.;
  sum=sum(rapid, wash, poor, other);
  if sum in (2,3) then vq= catx(',',put(rapid,rapid.),put(wash,wash.),put(poor,poor.),put(other,other.));
datalines;
1,X,,X, 
2,X,,, 
3,,X,X, 
4,,,,X
;
run;

Why might you want to have those values as 1 and 0 you might ask. The SUM of the variable in a procedure such as proc means, summary, tabulate or report would be the number of X's recorded and the Mean would be the percentage chosen for each variable. Also some modeling procedures require a numeric value, most will by default drop records with missing values for any of the variables on the model, class or by statements.

 

 

The custom formats to create the VQ phrase could also work by changing

value rapid

1='Rapid cleaning'

other=' ';

to

value $rapid

X='Rapid cleaning'

other=' ';

and using the $ version of the format in the put function call.

 

If you want to keep the X then you could use

sum=length(catt(rapid, wash,poor,other));

kmardinian
Quartz | Level 8

Thank you so much for your help!

i did realize that I misstyped my first post and that it should be whether SUM equals 2 or 3 not VQ (VQ is the new variable I was trying to create). But I will try and adjust the code you have provided me with to make it work!

kmardinian
Quartz | Level 8

 

If I am using a dataset, how would I code it without coding the actual datalines. And also, how would I call the format I've just created?

 

proc format library=work;
invalue check
'X'=1
other=0
;
value rapid
1='Rapid cleaning'
other=' ';
value wash
1='Quick wash'
other=' ';
value poor
1='Poor cleaning'
other=' ';
value other
1='Other'
other=' ';
run;

I used the above code and then added the code below

 

data K;
set V(keep=ID RAPID WASH POOR OTHER);
length VQ $100;
vq=vq(RAPID, WASH, POOR, OTHER);
if sum in (2,3) then vq=catx(',', put(RAPID, RAPID.), put(WASH, WASH.), put(POOR, POOR.),  put(OTHER, OTHER.));
format RAPID RAPID. WASH WASH. POOR POOR. OTHER OTHER.;
run;

 

And I seem to be getting this error and note as well:

ERROR 68-185: The function VQ is unknown, or cannot be accessed.

NOTE 484-185: Format $RAPID was not found or could not be loaded

ballardw
Super User

If I am using a dataset, how would I code it without coding the actual datalines. 

If your data set is some form of text file then the INFILE would reference your external file in place of data lines and have appropriate information such as the delimiter. I used a comma but others are possible. Then there would not be any DATALINES in the code to read your external file.

EACH of the variables to be read using the CHECK informat would have to be indicated. The input statement needs to match the order of your data.

If you source is a spreadsheet to the easiest would be to

1) do a file SAVE AS and select CSV (if not spreadsheet skip this)

2) Use either the import wizard or proc import to bring in the file. Use the option guessingrows with a very large value (number of rows in file is best if possible).

3) either of the import wizard or proc import will have data step code in the log. COPY that from the log and paste into the editor. You may need to remove line numbers..

4) change all of the informat statements for variables that likely came have a $1. to the CHECK. informat created that used X for the value.

5) save the program .

6) rerun the program to reread the data with the new informat.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 10585 views
  • 0 likes
  • 4 in conversation