BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sas-inquirer
Quartz | Level 8

Good afternoon,

     I am sure there is a more efficient way to do this so please don't hesitate to make suggestions on that aspect, but I am trying to compare the question/answer values by male and female to see if they are comparable. To do this, I concatenate the question and answer values onto one line, separate the table into one for male and one for female and then create a 'matched' table and 'unmatched' table using merge. The problem is, when I do this, one ID is not showing up in either table. PRODUCT_ID 5 is not in either table. Can anyone tell me why? It should be included in the matched table.  

 

The result should be:

Matched: PRODUCT_ID's 1,2,4 and 5.

Unmatched: PRODUCT_ID's 3,6.

 

Here is the code:

data have_1;
input product_id gender$ article$ question_id answer_id;
datalines;
1 M SHIRT 1 1
1 M SHIRT 2 1
1 M SHIRT 3 1
2 F SHIRT 1 1
2 F SHIRT 2 1
2 F SHIRT 3 1
3 M SHIRT 1 1
3 M SHIRT 2 2
3 M SHIRT 3 2
4 F PANTS 1 1
4 F PANTS 2 1
4 F PANTS 3 1
5 F PANTS 1 1
5 F PANTS 2 1
5 F PANTS 3 1
6 M PANTS 1 2
6 M PANTS 2 2
6 M PANTS 3 2
run;

data have_2;
length answer_values $1000.;
length question_values $1000.;
do until (last.product_id);
set have_1;
by product_id notsorted;
answer_values = catx(',',answer_values,answer_id);
question_values = catx(',',question_values,question_id);
end;
drop answer_id question_id;
run;

proc sql;
create table mens as
select *
from have_2
where GENDER = 'M'
order by question_values, answer_values;
quit;

proc sql;
create table womens as
select *
from data_2 
where GENDER = 'F'
order by question_values, answer_values;
quit;

data test_matched test_unmatched;
merge womens (in = a) mens (in = b);
by question_values answer_values;
if a and b then output test_matched;
else output test_unmatched;
run;

I'm using SAS 9.4. Any insight would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
sas-inquirer
Quartz | Level 8

I decided to take a different approach and it worked. I think I was getting records excluded because it was keeping the first instance of the duplicate. With the following code it takes all the records including the first instance. I also combined the question answer combinations as suggested.

 

data have_2;
length question_answer_values $1000.;
do until (last.product_id);
set have_1;
by product_id notsorted;
question_answer_values = catx(',',question_answer_values,question_id,answer_id);
end;
drop answer_id question_id;
run;

proc sort data = have_2 OUT = HAVE_3 NOUNIQUEKEY ;
BY question_answer_values;
run;

View solution in original post

4 REPLIES 4
ballardw
Super User

I'm not quite sure what your goal with all this matching may be.

I would be strongly tempted, at least at initial exploration of the data, dump this into one of the report procedures and see some overall results such as:

proc tabulate data=have_1;
   class  gender article question_id answer_id/missing;
   table question_id*answer_id,
         gender*(n pctn<question_id>)
         /misstext=' ';
run;

I'm not sure what roll product_id or article might play in this as the your example data has lots of disjoint behavior on product/article/answer_id.

Just one example of using a category of interest, such as gender, to group data by column (or row) to compare things.

 

 

sas-inquirer
Quartz | Level 8
Thanks for your input ballardw. This is a very simplified version of a table I am working with. The end goal is to get a list of IDs that are comparable by ARTICLE between male and female. I have a list of questions by ID and specific answers to those questions (i.e. question "% cotton", answer "50"). The same questions can be attributed to both male and female. I want to find the IDs that have the same question and answer combination so that I can compare apples to apples. I only want to see the IDs that all have an answer of 50% cotton (as well as matching answers to the other questions). In this way, when I compare other values outside of the question/answer say price (not shown here) I can make a true comparison.
ballardw
Super User

@sas-inquirer wrote:
Thanks for your input ballardw. This is a very simplified version of a table I am working with. The end goal is to get a list of IDs that are comparable by ARTICLE between male and female. I have a list of questions by ID and specific answers to those questions (i.e. question "% cotton", answer "50"). The same questions can be attributed to both male and female. I want to find the IDs that have the same question and answer combination so that I can compare apples to apples. I only want to see the IDs that all have an answer of 50% cotton (as well as matching answers to the other questions). In this way, when I compare other values outside of the question/answer say price (not shown here) I can make a true comparison.

Your example data doesn't have your apples to apples because of the values. The male/female do not have all of the combinations.

If you are having and issue with your complete data set with this then your data just is not rich enough to support the approach you are attempting.

 

I would strongly suggest getting some formats for the question and answer values and using CATX with the formatted values instead of the codes. Shouldn't you be combining the question and answer? You say you want " same question and answer combination" but build strings of only answers and only questions. Which doesn't make much sense, partially because you inherit a an order. Is '1,2,3' supposed to be the same as '3,2,1' if the questions are the same?

 

I think you need to demonstrate what you expect to get for " ALL the matches" from the example data as in a data step that will create the data set expected as I really don't understand what you want.

 

 

 

sas-inquirer
Quartz | Level 8

I decided to take a different approach and it worked. I think I was getting records excluded because it was keeping the first instance of the duplicate. With the following code it takes all the records including the first instance. I also combined the question answer combinations as suggested.

 

data have_2;
length question_answer_values $1000.;
do until (last.product_id);
set have_1;
by product_id notsorted;
question_answer_values = catx(',',question_answer_values,question_id,answer_id);
end;
drop answer_id question_id;
run;

proc sort data = have_2 OUT = HAVE_3 NOUNIQUEKEY ;
BY question_answer_values;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 407 views
  • 0 likes
  • 2 in conversation