BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sayno2s
Fluorite | Level 6

Hi there,

 

I have the following data set:

data work.Testdata;
	input Reviewer1 $ Reviewer2 $ Count;
	datalines;
	PersonA PersonB 33
	PersonB PersonA 19
	PersonA PersonC 345
	PersonD PersonB 70
	PersonD PersonC 73
	PersonB PersonC 2
	PersonC PersonB 42
;
run;

A person can function as either Reviewer1 or Reviewer2 for a given observation and there’s one observation per row. The same two people can function as opposite reviewers for two separate observations, e.g., row 1 and 2.

 

My goal is to find occurrences where there are the same two names in two different rows (e.g., row 1 and 2) and count those lines together so I'm left with total counts for each reviewer pair.

 

My desired output (note it doesn't matter if the first or second reviewer name combination is used in the output):

Reviewer1Reviewer2Count
PersonAPersonB52
PersonAPersonC345
PersonDPersonB70
PersonDPersonC73
PersonBPersonC44

 

Being new to SAS & EG (using SAS Enterprise Guide 7.1), I’ve been trying different approaches (e.g., trying to self-join the table using something like where t1.Reviewer1 = t2.Reviewer2 and t1.Reviewer2 = t2.Reviewer1 or creating a variable with all possible reviewer name combinations and then trying to use this in some loop) but I'm always getting stuck.

 

Please help! Thank you so very much in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Your input data has extra leading spaces so that it's "       PersonA" versus "PersonA".

 

Either use the Trim/compress/strip below or clean up the input data in the read in step. 

 

 

 

data work.Testdata;
	input Reviewer1 $ Reviewer2 $ Count;
	datalines;
PersonA PersonB 33
PersonB PersonA 19
PersonA PersonC 345
PersonD PersonB 70
PersonD PersonC 73
PersonB PersonC 2
PersonC PersonB 42
;
run;

data have;
set testdata;
/* reviewer1=strip(trim(compress(reviewer1, , 's'))); */
/* reviewer2=strip(trim(compress(reviewer2, , 's'))); */
call sortc(reviewer1, reviewer2);
run;


ods select none;
proc means data=have nway sum;
class reviewer1 reviewer2;
var count;
ods output summary=want;
run;
ods select all;

proc print data=want;
run;

@sayno2s wrote:

Thank you for your help, but it's not quite what I need.

 

Below is the output from your code. The goal is to sum lines 1+3 to get the total for reviewer pair Person A and Person B (they reviewed two observations as a pair and so the total count is needed) and sum lines 4+5 to get the total for reviewer pair Person B and Person C.

 

Output.PNG

I'd be very grateful if you could take another look.

Thank you so much!


 

View solution in original post

11 REPLIES 11
andreas_lds
Jade | Level 19

Here is one way to solve this:

data ordered;
   set Testdata;
   
   array r Reviewer:;
   
   call sortc(of r[*]);
run;

proc summary data=ordered nway;
   class Reviewer1 Reviewer2;
   var Count;
   output out=want(drop= _type_ _freq_) sum=;
run;
sayno2s
Fluorite | Level 6

Thank you for your help, but it's not quite what I need.

 

Below is the output from your code. The goal is to sum lines 1+3 to get the total for reviewer pair Person A and Person B (they reviewed two observations as a pair and so the total count is needed) and sum lines 4+5 to get the total for reviewer pair Person B and Person C.

 

Output.PNG

I'd be very grateful if you could take another look.

Thank you so much!

Reeza
Super User

Your input data has extra leading spaces so that it's "       PersonA" versus "PersonA".

 

Either use the Trim/compress/strip below or clean up the input data in the read in step. 

 

 

 

data work.Testdata;
	input Reviewer1 $ Reviewer2 $ Count;
	datalines;
PersonA PersonB 33
PersonB PersonA 19
PersonA PersonC 345
PersonD PersonB 70
PersonD PersonC 73
PersonB PersonC 2
PersonC PersonB 42
;
run;

data have;
set testdata;
/* reviewer1=strip(trim(compress(reviewer1, , 's'))); */
/* reviewer2=strip(trim(compress(reviewer2, , 's'))); */
call sortc(reviewer1, reviewer2);
run;


ods select none;
proc means data=have nway sum;
class reviewer1 reviewer2;
var count;
ods output summary=want;
run;
ods select all;

proc print data=want;
run;

@sayno2s wrote:

Thank you for your help, but it's not quite what I need.

 

Below is the output from your code. The goal is to sum lines 1+3 to get the total for reviewer pair Person A and Person B (they reviewed two observations as a pair and so the total count is needed) and sum lines 4+5 to get the total for reviewer pair Person B and Person C.

 

Output.PNG

I'd be very grateful if you could take another look.

Thank you so much!


 

sayno2s
Fluorite | Level 6

Thank you so, so much!

Thank you for the code and for explaining SAS behavior (which led to the removal of special characters in the real data when the code wasn't working)!

 

This is the final code for the real data (just in case this might help someone else):

/* The real data have special characters, which needed to be removed first */
data work.new_input;
    set work.input;
    reviewer1=compress(Reviewer1, , 'ka');
	reviewer2=compress(Reviewer2, , 'ka');
run;

data have;
set new_input;
call sortc(reviewer1, reviewer2);
run;

ods select none;
proc means data=have nway sum;
class reviewer1 reviewer2;
var Count;
ods output summary=want;
run;
ods select all;

/*proc print data=want;*/
/*run;*/

/* I need the output to be another table so I can do some further manipulations */
proc sql;
	create table final as
		select
			Reviewer1,
			Reviewer2,
		    NObs,
		    Count_Sum
	from want;
quit;

/* Yippie! */
Tom
Super User Tom
Super User

You should not have to use compress() function. 

How did you create the original data?  Did you actually use a data step like in your original post?

data work.Testdata;
	input Reviewer1 $ Reviewer2 $ Count;
	datalines;
	PersonA PersonB 33
	PersonB PersonA 19
	PersonA PersonC 345
	PersonD PersonB 70
	PersonD PersonC 73
	PersonB PersonC 2
	PersonC PersonB 42
;

If so then do two things.

1) Make sure to start the lines of data in the FIRST COLUMN on the line.  It will be easier to remember to do this if you also start the DATALINES statement on the first line of the file.

data work.Testdata;
    input Reviewer1 $ Reviewer2 $ Count;
datalines;
PersonA PersonB 33
PersonB PersonA 19
PersonA PersonC 345
PersonD PersonB 70
PersonD PersonC 73
PersonB PersonC 2
PersonC PersonB 42
;

2) Do NOT put TAB characters into your program files. Especially not in the middle of lines of DATA.  

There is no need to place actual TABs in the file as the Editor will indent the number of spaces you want for you without actually inserting the physical TAB character when you hit the TAB key on the keyboard.

For example here is preference screen for the version of SAS/Studio used in SAS ODA 

Tom_0-1663823407083.png

It looks like the software that displays the code on this forum uses 4 spaces for the tab stops.

If you use the EXPANDTABS option of the INFILE statement then SAS will expand the tabs to 8 character tab stops.

I prefer to use 2 spaces per tab stop because it means you can have more levels of nested indents in your program before indentation moves the actual code out of the line of vision of the user trying to read the file. 

 

sayno2s
Fluorite | Level 6

Thanks for the info! I had copied the data from excel and didn't know that SAS is sensitive to that (new to SAS).

 

I think leading/trailing spaces or some other format issue is preventing me to merge tables created with proc sql vs datalines. Do you have a tip how I can check issues/differences with char formatting between a proc sql table vs input created with datalines?

 

The following code doesn't merge tables (only returns the last of four tables while the datalines code below does) and I think it has to do with formatting...?! At least, I don't understand why it doesn't work.

 

/* Original input comes from an excel file; no formatting was specified on the input */
/* There were a couple of other tables created to preprocess data (and get Data_PREPROC), none explicitly stating any formatting */
proc sql; create table Worst_notWorst as select t1.Person1, t1.Person2, t1.Rating1, t1.Rating2 from Data_PREPROC t1 where t1.Rating1 = 'Worst' AND t1.Rating2 ~= 'Worst'; quit; proc sql; create table work.Worst_notWorst_s as select distinct t1.Person1, t1.Person2, count(Rating1) as Count from work.Worst_notWorst group by Person1, Person2; quit; /* I tried these steps below, but they don't make a difference */
/*data work.Worst_notWorst_s_t;*/
/* set work.Worst_notWorst_s;*/
/* Person1=compress(Person1, , 'ka');*/
/* Person2=compress(Person2, , 'ka');*/
/*run;*/ /* OR */
/*data work.PD_PD_s_t;*/
/* set work.PD_PD_s;*/
/* Person1=strip(Person1);*/
/* Person2=strip(Person2);*/
/*run;*/
data work.Worst_notWorst_s_t; set work.Worst_notWorst_s; call sortc(Person1, Person2); run; ods select none; proc means data=work.Worst_notWorst_s_c nway sum; class Person1 Person2; var Count; ods output summary=work.Worst_notWorst_s_c_s; run; ods select all; proc sql; create table work.YN as select Person1, Person2, NObs, Count_Sum from work.Worst_notWorst_s_c_s; quit; /* Create three more tables like this */
/* (This should be done with some macro or for loop, but I'm not yet there with my SAS skills*/ /* Then merge the 4 tables...however, below only returns the last table */ data work.contingency_table; merge work.NN work.NY work.YN work.YY; by Person1 Person2; drop NObs; run;
/* Yet when I create the table with datalines like below then the merge works */
/* Puzzling - is it because I'm explicitly specifying the attr here? */

data YY_test; attrib Person1 length=$50 Person2 length=$50; input Person1 $ Person2 $ NObs YY_Sum; datalines; name1@website.com name2@website.com 1 6 name1@website.com name4@website.com 1 9 name3@website.com name2@website.com 2 99 name3@website.com name4@website.com 1 18 name2@website.com name4@website.com 2 16 ; /* Create three more tables like this... */
/* ...a merged table is created as expected */ data work.contingency_table; merge work.NN work.NY work.YN work.YY; by Person1 Person2; drop NObs; run;

Thank you very much for any suggestions in advance!

 

Tom
Super User Tom
Super User

There is nothing that I can see in your SQL that is of any real concern.

 

But if the original data comes from EXCEL you will probably want to clean it up before trying to combine it.  EXCEL files are NOT databases.  They are spreadsheets. They are DESIGNED to allow the entry of anything into any cell in the sheet.  If you enter leading spaces into a cell then they will be there in the value moved into SAS.

 

There is also no way to tell SAS what length to use when creating character variables.  So it just guesses that the proper length is one that can hold the longest string in that column of this particular spreadsheet.  So when you attempt to combine two datasets created from two spreadsheets the lengths might not match.  That can lead to truncation.

 

To make things worse SAS will (wrongly in my opinion) attach a $ format with a specific width to every character variable it creates from an Excel spreadsheet.  So even if you manage to combine the datasets without truncating the value you might end up with a format attached to the variable that has a width that is too short to display all of the values.  So in addition to making sure the lengths are long enough to combine data from multiple sheets make sure that the formats don't accidentally hide some of the data.  Best is to just remove formats from character variables.

sayno2s
Fluorite | Level 6

Thank you so much for all the good info!

 

Best is to just remove formats from character variables.

Is below what you mean?

 

/* Check formatting */
ods select Variables;
proc contents data=work.have;
run;
ods select default;

/* Remove formatting */
proc datasets lib=work memtype=data;
modify have;
attrib _all_ label=' ';
attrib _all_ format=' ';
attrib _all_ informat=' ';
run;
quit;

 

I'm using guessingrows=3000 at the import stage; SAS is guessing 32 char length, which is indeed the longest name.

Formatting_import.PNG

I've played around with removing the formatting at various stages, but that doesn't change anything.

Do you have an idea what else could be the problem?

 

Tom
Super User Tom
Super User

We seem to have wondered off of the topic of the original question.

 

Did you figure out had to use the COMPRESS() function on the strings to get them to match?

Did you identify any particular value that was not matching?

If so did you look at the actual characters that were in the value?  If you print a character variable using the $HEX format you can see the ASCII codes that it contains. Things like 20 for a space or 41 for an uppercase letter A.  

 

Once you find the offending value check if 

  • It has leading spaces
  • It has trailing carriage return ( 0D in hexadecimal) or line feed ( 0A in hex ).
  • It has endash or emdash characters instead of hyphens
  • It has the stupid things that Microsoft calls "smart" quotes instead of actual quote or double quote characters.
  • etc.
sayno2s
Fluorite | Level 6

Thank you again! I've searched and played around with various potential formatting issues for a good amount of time only to realize that the issue was that the variable to be merged had the same name in the tables created with prior code while the datalines tables did not. UGH! I had assumed that I only need to tell SAS I want to merge BY var1 and var2 and then it would just put the other variables side-by-side. Instead, it just overwrote the tables.

Thanks for all your help!

sayno2s
Fluorite | Level 6
Sorry, I just realized that you weren't the one who responded last. Thank you so, so much for the initial code!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3659 views
  • 3 likes
  • 4 in conversation