- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Greetings, I'm fairly new to SAS and I was wondering if this would be possible
On the image that I have attached, I am showing a column from an excel file in which several polymorphisms have been found in some patient samples and are separated by commas. I was wondering if there was a way to extract each polymorphism into its own column as and then count how many of those repeat or are unique.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
COUNTW() will tell you the number of items you have.
SCAN() will extract each item
OUTPUT will create each line.
data long;
set have;
ID = _n_; * you likely don't need this;
n_loop = count(pr_poly);
do i=1 to n_loop;
polyMorph = scan(pr_poly, i, ",");
output;
end;
run;
To count distinct now:
proc sql;
create table want as
select ID, count(distinct polyMorph) as n_distinct
from long
group by ID;
quit;
Your exact data structure may require some modifications to the code, but hopefully that gives you a good idea.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I modified the first part of the code because I think you meant to use COUNTW() instead of COUNT()
I have attached a screenshot with some of the results, i was wondering if the idea here was to extract only one value into the polyMorph column according to the value of i. Would it be possible of instead of having one row per polymoprhism, you could have several columns for each polymorphism.
data long;
set mydata;
id=_n_;
n_loop=countw(Pr_poly, ",");
do i=1 to n_loop;
polyMorph=scan(pr_poly, i, ',');
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I sort of managed to take them into their own column using PROC TRANSPOSE. I suppose I could use other procedures that allow me to count each individual polymorphism
proc transpose data=long out=ordered prefix=polymorph;
by id;
var polymorph;
run;
Eventually I would like to associate all the different PR_polymorphisms under a single category
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To continue this thread in case somebody faces the same issue (or as a reference to myself who might need it later), I managed to count unique polymorphisms after I transposed them each one in their own column using the following code
data PrWant;
set hivfull.pr_polymorph;
array pr_polymorph{24}pr_polymorph1-pr_polymorph24;
array new {24} $20 _temporary_;
do _n_=1 to 24;
new{_n_}=pr_polymorph{_n_};
end;
call sortc(of new{*});
count=(new{1}>'');
do _n_=2 to 24;
if new{_n_} ne new{_n_-1} then
count + 1;
end;
run;
The next step would be to evaluate which polymorphisms are the most common regardless of where they are placed in the column, it is likely that polymorphism A in column 23 could be found in a different column number for another patient, which is why i think it would've been more useful to have each polymorphism as its own column name and then rows would check if it is present or not.