SAS Procedures

CliveWarren · Posted 07-02-2019 01:47 PM

Greetings, I'm fairly new to SAS and I was wondering if this would be possible

On the image that I have attached, I am showing a column from an excel file in which several polymorphisms have been found in some patient samples and are separated by commas. I was wondering if there was a way to extract each polymorphism into its own column as and then count how many of those repeat or are unique.

Reeza · Posted 07-02-2019 01:51 PM

I would actually change your data structure by creating multiple lines, one for each polymorphism and then count them that way.

COUNTW() will tell you the number of items you have.
SCAN() will extract each item
OUTPUT will create each line.

data long;
set have;
ID = _n_; * you likely don't need this;
n_loop = count(pr_poly);
do i=1 to n_loop;
polyMorph = scan(pr_poly, i, ",");
output;
end;

run;

To count distinct now:

proc sql;
create table want as
select ID, count(distinct polyMorph) as n_distinct
from long
group by ID;
quit;

Your exact data structure may require some modifications to the code, but hopefully that gives you a good idea.

CliveWarren · Posted 07-02-2019 02:30 PM

I modified the first part of the code because I think you meant to use COUNTW() instead of COUNT()

I have attached a screenshot with some of the results, i was wondering if the idea here was to extract only one value into the polyMorph column according to the value of i. Would it be possible of instead of having one row per polymoprhism, you could have several columns for each polymorphism.

data long;
	set mydata;
	id=_n_;
	n_loop=countw(Pr_poly, ",");

	do i=1 to n_loop;
		polyMorph=scan(pr_poly, i, ',');
		output;
	end;
run;

Reeza · Posted 07-02-2019 02:32 PM

The trick with that is you need to know the maximum number of columns ahead of time. SAS can't add columns dynamically. So you either need to know that ahead of time, but then there's no easy way to count distinct within a row either or count duplicates so a long format gives you what you need. It is possible to do what you want, but it's more code.

CliveWarren · Posted 07-02-2019 03:02 PM

I sort of managed to take them into their own column using PROC TRANSPOSE. I suppose I could use other procedures that allow me to count each individual polymorphism

proc transpose data=long out=ordered prefix=polymorph;
	by id;
	var polymorph;
run;

Eventually I would like to associate all the different PR_polymorphisms under a single category Eventually I would like to associate all the different PR_polymorphisms under a single category

CliveWarren · Posted 07-03-2019 01:16 PM

To continue this thread in case somebody faces the same issue (or as a reference to myself who might need it later), I managed to count unique polymorphisms after I transposed them each one in their own column using the following code

data PrWant;
	set hivfull.pr_polymorph;
	array pr_polymorph{24}pr_polymorph1-pr_polymorph24;
	array new {24} $20  _temporary_;
	do _n_=1 to 24;
		new{_n_}=pr_polymorph{_n_};
	end;
	call sortc(of new{*});
	count=(new{1}>'');

	do _n_=2 to 24;

		if new{_n_} ne new{_n_-1} then
			count + 1;
	end;
run;

The next step would be to evaluate which polymorphisms are the most common regardless of where they are placed in the column, it is likely that polymorphism A in column 23 could be found in a different column number for another patient, which is why i think it would've been more useful to have each polymorphism as its own column name and then rows would check if it is present or not.

SAS Procedures

Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Follow Us

What is...

SAS Procedures

Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Re: Counting similar comma separated values in columns

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...