SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Subsetting identical observations that differ by one variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Subsetting identical observations that differ by one variable

Hello guys,

 

My data looks like the attached screenshot. I have observations that have identical values by some variables but differ by another variable "viralresult". For example, the "order number" and "test type" have the same values but the viral test results are different (see attached screenshot). I'm trying to keep only one observation for each order number and test type, based on the test result variable (e.g. keep one of orderno:1033730189, test: Respiratory Virus Panel PCR that has a positive test result). I am using SAS University Edition (Version: university.cny.sas.com@sas:university-3p.2/3p.2.f23fd5825fb4-1-1)

 

Hope you can help!

 

 


Screen Shot 2016-12-23 at 8.51.33 PM.png

Accepted Solutions
Solution
‎12-25-2016 12:03 PM
Trusted Advisor
Posts: 1,137

Re: Subsetting identical observations that differ by one variable

Posted in reply to Doyinsola
sorry but you are supposed to use the informat $ord. and create the code variable and then sort the data by code like below

proc format;
invalue $ord
'positive'=1
'inconclusive'=2
'negative'=3;
run;


data want;
set have;
code=input(viralresult,$ord.);
run;

proc sort data=want;
by orderno test code;
run;


data want;
set have;
by orderno test code;
if first.test;
run;
Thanks,
Jag

View solution in original post


All Replies
Super User
Super User
Posts: 7,076

Re: Subsetting identical observations that differ by one variable

Posted in reply to Doyinsola

Please post data as text, preferable as a data step that can be used to recreate it.  I cannot copy and paste a photograph of your data into my program editor.

 

Occasional Contributor
Posts: 7

Re: Subsetting identical observations that differ by one variable

Thank you Tom,

 

So here are my two different scenarios:

 

DATA have;
        INPUT orderno  test  $ viralresult $;
        DATALINES;
              1033730189  Respiratory Virus Panel PCR  negative
              1033730189  Respiratory Virus Panel PCR  positive
              1126522178  Respiratory Virus Panel PCR  inconclusive
              1126522178  Respiratory Virus Panel PCR  negative
              ;
Run;

1) If the test results are both negative and positive then I want to keep the positive one.

2) If the results are both inconclusive and negative, I want to keep the inconclusive one.

Trusted Advisor
Posts: 1,137

Re: Subsetting identical observations that differ by one variable

Posted in reply to Doyinsola

Consider that the data is sorted by orderno test viralresult. Then i use

 

data want;

set have;

by orderno test viralresult;

if last.test;

run;

 

Thanks,
Jag
Occasional Contributor
Posts: 7

Re: Subsetting identical observations that differ by one variable

Posted in reply to Jagadishkatam

Thank you Jagadishkatam,

 

So here are my two different scenarios:

 

 

DATA have;
        INPUT orderno test $ viralresult $;
        DATALINES;
              1033730189  Respiratory Virus Panel PCR  negative
              1033730189  Respiratory Virus Panel PCR  positive
              1126522178  Respiratory Virus Panel PCR  inconclusive
              1126522178  Respiratory Virus Panel PCR  negative
              ;
Run;

1) If the test results are both negative and positive then I want to keep the positive one.

2) If the results are both inconclusive and negative, I want to keep the inconclusive one.

 

Based on your suggestion I would only be able to retain the positive observation in scenario 1 and the negative observation in scenario 2 right?

Trusted Advisor
Posts: 1,586

Re: Subsetting identical observations that differ by one variable

Posted in reply to Doyinsola

If your preffered order of viralresult, low to high, has just those 3 options:

     inconclusive - negative - positive

then, that is same order as the alphanumeric sort:

     proc sort data=have; by orderno test viralresult;

     data want; set have; by orderno test; if last.test; run;

 

if there are more result options then you need assign a prefference code to each viralresult

and select by that code.

 

Trusted Advisor
Posts: 1,137

Re: Subsetting identical observations that differ by one variable

[ Edited ]

I also wanted to check what if the test result is inconclusive and positive. As per my assumption you will consider the positive record. But would like to check.

 

consider that your order of preference is positive followed by inconclusive and last negative. In this case you need to derive a numeric code with the preferred order with proc format like below then I sort the data by code and use first.test to get the expected result

 

proc format;

invalue $ord

'positive'=1

'inconclusive'=2

'negative'=3;

run;

 

data want;

set have;

code=input(viralresult,$ord.);

run;

 

proc sort data=want;

by orderno test code;

run;

 

data want;

set have;

by orderno test code;

if first.test;

run;

Thanks,
Jag
Occasional Contributor
Posts: 7

Re: Subsetting identical observations that differ by one variable

Posted in reply to Jagadishkatam

Yes, that is typically the order of preference but for some reasone I can't get the code to work. 

 

This portion of the code gives the following error:

Data have;
	Set want;
	Code = input (viralresult, best.);
Run;

 NOTE: Invalid argument to function INPUT at line 58 column 9.

 I circumvented this and assign the preferred numeric code a different way. Following this, I ran the next portion of the code (using If first.testSmiley Wink and obtained an empty dataset with no observations.

Occasional Contributor
Posts: 7

Re: Subsetting identical observations that differ by one variable

This would work, but for the fact that I prefer "positive" over "inconclusive" so it will no longer follow the natural alphanumeric ordering.

Solution
‎12-25-2016 12:03 PM
Trusted Advisor
Posts: 1,137

Re: Subsetting identical observations that differ by one variable

Posted in reply to Doyinsola
sorry but you are supposed to use the informat $ord. and create the code variable and then sort the data by code like below

proc format;
invalue $ord
'positive'=1
'inconclusive'=2
'negative'=3;
run;


data want;
set have;
code=input(viralresult,$ord.);
run;

proc sort data=want;
by orderno test code;
run;


data want;
set have;
by orderno test code;
if first.test;
run;
Thanks,
Jag
Occasional Contributor
Posts: 7

Re: Subsetting identical observations that differ by one variable

Posted in reply to Jagadishkatam

Thanks Jag,

 

The final portion of the code below returns an empty dataset though.

Data want;
	Set have;
	If first.test;
Run;
Trusted Advisor
Posts: 1,137

Re: Subsetting identical observations that differ by one variable

Posted in reply to Doyinsola
yes it will return empty dataset, when you are using the first. then you are supposed to use the by statement, but you did not mention the by statement.

please try this code

proc sort data=want;
by orderno test code;
run;

data want;
set have;
by orderno test code;
if first.test;
run;
Thanks,
Jag
Occasional Contributor
Posts: 7

Re: Subsetting identical observations that differ by one variable

Posted in reply to Jagadishkatam
Perfect! Thank you for your help, it worked just fine.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 517 views
  • 3 likes
  • 4 in conversation