BookmarkSubscribeRSS Feed
CliveWarren
Calcite | Level 5

Greetings, I'm fairly new to SAS and I was wondering if this would be possible

On the image that I have attached, I am showing a column from an excel file in which several polymorphisms have been found in some patient samples and are separated by commas. I was wondering if there was a way to extract each polymorphism into its own column as and then count how many of those repeat or are unique. 

Capture.PNG

5 REPLIES 5
Reeza
Super User
I would actually change your data structure by creating multiple lines, one for each polymorphism and then count them that way.

COUNTW() will tell you the number of items you have.
SCAN() will extract each item
OUTPUT will create each line.


data long;
set have;
ID = _n_; * you likely don't need this;
n_loop = count(pr_poly);
do i=1 to n_loop;
polyMorph = scan(pr_poly, i, ",");
output;
end;

run;

To count distinct now:

proc sql;
create table want as
select ID, count(distinct polyMorph) as n_distinct
from long
group by ID;
quit;

Your exact data structure may require some modifications to the code, but hopefully that gives you a good idea.
CliveWarren
Calcite | Level 5

I modified the first part of the code because I think you meant to use COUNTW() instead of COUNT()

 

I have attached a screenshot with some of the results, i was wondering if the idea here was to extract only one value into the polyMorph column according to the value of i. Would it be possible of instead of having one row per polymoprhism, you could have several columns for each polymorphism. 

 

data long;
	set mydata;
	id=_n_;
	n_loop=countw(Pr_poly, ",");

	do i=1 to n_loop;
		polyMorph=scan(pr_poly, i, ',');
		output;
	end;
run;

Capture.PNG

Reeza
Super User
The trick with that is you need to know the maximum number of columns ahead of time. SAS can't add columns dynamically. So you either need to know that ahead of time, but then there's no easy way to count distinct within a row either or count duplicates so a long format gives you what you need. It is possible to do what you want, but it's more code.
CliveWarren
Calcite | Level 5

I sort of managed to take them into their own column using PROC TRANSPOSE. I suppose I could use other procedures that allow me to count each individual polymorphism

proc transpose data=long out=ordered prefix=polymorph;
	by id;
	var polymorph;
run;

Capture.PNGEventually I would like to associate all the different PR_polymorphisms under a single categoryEventually I would like to associate all the different PR_polymorphisms under a single category

CliveWarren
Calcite | Level 5

To continue this thread in case somebody faces the same issue (or as a reference to myself who might need it later), I managed to count unique polymorphisms after I transposed them each one in their own column using the following code

data PrWant;
	set hivfull.pr_polymorph;
	array pr_polymorph{24}pr_polymorph1-pr_polymorph24;
	array new {24} $20  _temporary_;
	do _n_=1 to 24;
		new{_n_}=pr_polymorph{_n_};
	end;
	call sortc(of new{*});
	count=(new{1}>'');

	do _n_=2 to 24;

		if new{_n_} ne new{_n_-1} then
			count + 1;
	end;
run;

Capture.PNG

The next step would be to evaluate which polymorphisms are the most common regardless of where they are placed in the column, it is likely that polymorphism A in column 23 could be found in a different column number for another patient, which is why i think it would've been more useful to have each polymorphism as its own column name and then rows would check if it is present or not.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 5271 views
  • 1 like
  • 2 in conversation