BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Epi_Stats
Obsidian | Level 7

Hi,

 

I have a results dataset with study participant IDs and string result values. The data were recorded over different time period, hence there are multiple rows for each ID, and some IDs have the same result value at different time periods (e.g. for ID #1 resultB is red-apples at both time periods),

 

 

data have;
input ID resultA :$200. resultB :$200. ;
datalines;
1 green-apples red-apples
1 red-apples red-apples
2 green-apples yellow-pears
2 red-apples yellow-pears
3 green-apples red-apples
3 green-apples green-apples
4 red-apples green-apples
4 red-apples red-apples
;
run;

 

I want to generate a unique list of all string result values, and then determine how many IDs had this result (so collapsing the result string values by ID/reporting the result string by total number of IDs with this result...). From my searching on the forum so far, I think I need to first append the resultA and result B columns..

 

data Appended;
set have;
array NAMES resultA--resultB;
do i = 1 to dim(NAMES);
	if not missing (NAMES{i}) then do;
	All_Names = NAMES{i};
	output;
	end;
end;
drop resultA--resultB i;
run;

And then perhaps use proc report to generate a summary, similar to a pivot table in Excel

 

proc report data=Appended;
column All_Names id ;
define All_Names/group "Names";
define ID/group "id";
run;

But I'm not sure if this is the most efficient way to achieve what I want (dataset below)

data want;
input Names :$200. ID_count ;
datalines;
green-apples 4 
red-apples 4
yellow-pears 1
;
run;

Any help much appreciated. Happy to clarify if above unclear.

 

Thank you,

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I find counting is usually easily done in PROC FREQ.

 

proc freq data=appended;
	table id*all_names/noprint out=_counts1_;
run;
proc freq data=_counts1_;
	table all_names/noprint out=want;
run;

 

--
Paige Miller

View solution in original post

10 REPLIES 10
ballardw
Super User

Your idea for reshaping is correct though I wouldn't think of it as appending.

 

See if this is what you need:

proc sql;
   create table want as
   select all_names, count(*)
   from (select distinct all_names,id from Appended)
   group by all_names
   ;
quit;

The sub-query in the From, the (select distinct all_names,id from Appended) gets unique combinations of the all_names and id. the outer select with the group by gets the count of those unique by the values of all_names.

 

If the data set is real large sometimes the distinct bit can take a bit of time as there is sorting and comparisons going on in the background.

PaigeMiller
Diamond | Level 26

I find counting is usually easily done in PROC FREQ.

 

proc freq data=appended;
	table id*all_names/noprint out=_counts1_;
run;
proc freq data=_counts1_;
	table all_names/noprint out=want;
run;

 

--
Paige Miller
Epi_Stats
Obsidian | Level 7

Thank you very much, @ballardw  and @PaigeMiller  - both of your responses are practical and readily solve my question (I wish I could accept them both as the solution!). 

 

Thank you very much again for your help.

Epi_Stats
Obsidian | Level 7

One related query I have is, how I can reconcile reverse-order results in the dataset (e.g. ID #2 apples-red instead of red-apples). For example, based on the above data and code, 

 

data have;
input ID resultA :$200. resultB :$200. ;
datalines;
1 green-apples red-apples
1 red-apples red-apples
2 green-apples yellow-pears
2 apples-red yellow-pears
3 green-apples red-apples
3 green-apples green-apples
4 red-apples green-apples
4 red-apples red-apples
;
run;

/*APPEND resultA--resultB COLUMNS */
data Appended;
set have;
array NAMES resultA--resultB;
do i = 1 to dim(NAMES);
	if not missing (NAMES{i}) then do;
	All_Names = NAMES{i};
	output;
	end;
end;
drop resultA--resultB i;
run;

proc sql;
   create table want as
   select all_names, count(*)
   from (select distinct all_names,id from Appended)
   group by all_names
   ;
quit;

/*ALTERNATIVELY, USING PROC FREQ*/
proc freq data=appended;
	table id*all_names/noprint out=_counts1_;
run;
proc freq data=_counts1_;
	table all_names/noprint out=want;
run;

You can see that the frequency count reports for 3 "red-apples" and 1 for "apples-red", but I really just want the results to show for "apples-red" format - perhaps I need to clean the results data so the order of the 2 words either side of the hyphen is consistent throughout the results... can you recommend an efficient way to do this?

 

Thank you again,

PaigeMiller
Diamond | Level 26

In a SAS data step, after your create data set APPENDED (or even while you are creating APPENDED), split the text string into two words at the hyphen, so you would get (depending on the original text string)

 

word1=apples

word2=red

 

OR

 

word1=red

word2=apples

 

Then sort the values of WORD1 and WORD2 into alphabetical order, using CALL SORTC() In this case, apples will always be first and red will always be second.

 

Then combine WORD1 and WORD2 back into one text string and run the double PROC FREQ as before.

--
Paige Miller
patterroni
Calcite | Level 5
/* Create the dataset APPENDED */
data APPENDED;
input textstring $50.;
datalines;
apples-red
red-apples
grapes-green
blueberries-blue
;
run;

/* Process the dataset to split and sort the words */
data APPENDED;
set APPENDED;

/* Split the text string into two words at the hyphen */
word1 = scan(textstring, 1, '-');
word2 = scan(textstring, 2, '-');

/* Sort the words into alphabetical order */
call sortc(word1, word2);

/* Combine the words back into one string */
newtextstring = catx('-', word1, word2);
run;

/* First PROC FREQ on original textstring */
proc freq data=APPENDED;
tables textstring / nocum nopercent;
run;

/* Second PROC FREQ on the new textstring */
proc freq data=APPENDED;
tables newtextstring / nocum nopercent;
run;
ballardw
Super User

My first reaction to seeing your example was "why are there two values in one variable". That almost never bodes well for good analysis or reporting.

 

This question then also brings up the possible case of what if you only have "apples" or "red" but not both for some value?

 

Epi_Stats
Obsidian | Level 7

Hi @ballardw , the output data are numerical and linked to a reference file containing "two values in one variable", as you note. The examples are just that, as I can't share my data. 

Tom
Super User Tom
Super User

You can sort them, but if you only have two values then a simple IF/THEN achieve the ordering.

Just stick it into the step that is generating the list of all values.

...
All_Names = NAMES{i};
if scan(all_names,1,'-') > scan(all_names,2,'-') then 
  all_names = catx('-',scan(all_names,2,'-'),scan(all_names,1,'-'))
;
...
Epi_Stats
Obsidian | Level 7

Thanks Tom, this is pretty much how I ended up doing it based on @PaigeMiller 's helpful guidance

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3351 views
  • 7 likes
  • 5 in conversation