Solved: Re: selecting specific values in a data set

mmea · Posted 02-02-2021 01:35 PM

Hi if I have a data set like this:

Test center	Capacity
Test center Toronto	460
Test center Toronto - close contact	80
Test center Melbourne	6566
Test center Melbourne - close contact	70
Test Center Copenhagen	666
Test Center Vaan	70
Test Center Thisted	152
Test Center Thisted - close contact	12
Test Center A	200
Test Center A - close contact	12

I want to make two new datasets out of this dataset (I have more observations in the real one)

The first dataset should contain all test center who has capacity over 400, but the matching "close contact" center should follow with, even though it can have below 400.

The second dataset should only contain test centers below 400 and the matching close contact center should follow with.

How can I do these statements?

PeterClemmensen · Posted 02-02-2021 02:16 PM

Do this

data have;
input Testplace $ 1-23 Method $ 24-37 Capacity;
datalines;
Test place Toronto     Normal         599 
Test place Toronto     Close contact  23  
Test place Copenhagen  Normal         699 
Test place Copenhagen  Close contact  23  
Test Place Thisted     Normal         159 
Test Place Thisted     Close contact  15  
;

data over under;
    set have;
    if Method = "Close contact" | Capacity >  400 then output over;
    else if Capacity <= 400 then output under;
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 02-02-2021 01:46 PM

You mention that the "Close Contact" values should be in both data sets, is that correct?

The DATA to DATA Step Macro
Blog: SASnrd

mmea · Posted 02-02-2021 01:53 PM

No, if a test place is over 400

the matching close contact center should be in this dataset despite their capacity value.

it should NOT not appear in the dataset for under 400.

PeterClemmensen · Posted 02-02-2021 01:55 PM

Ok. Then do this.

data have;
input TestCenter $ 1-39 Capacity;
datalines;
Test center Toronto                    460 
Test center Toronto - close contact    80  
Test center Melbourne                  6566
Test center Melbourne - close contact  70  
Test Center Copenhagen                 666 
Test Center Vaan                       70  
Test Center Thisted                    152 
Test Center Thisted - close contact    12  
Test Center A                          200 
Test Center A - close contact          12  
;

data over under;
    set have;
    if find(TestCenter, 'close contact', 'i') | Capacity >  400 then output over;
    else if Capacity <= 400 then output under;
run;

The DATA to DATA Step Macro
Blog: SASnrd

mmea · Posted 02-02-2021 02:09 PM

Thanks!!

In a case where the data set can look like this, how will I then do the same:

Test place	Method	Capacity
Test place Toronto	Normal	599
Test place Toronto	Close contact	23
Test place Copenhagen	Normal	699
Test place Copenhagen	Close contact	23
Test Place Thisted	Normal	159
Test Place Thisted	Close contact	15

So a data set for all "normal" over 400 but the matching "close contact" shall follow.

second data set for all "normal" under 400 and with their matching close contact

PeterClemmensen · Posted 02-02-2021 02:16 PM

Do this

data have;
input Testplace $ 1-23 Method $ 24-37 Capacity;
datalines;
Test place Toronto     Normal         599 
Test place Toronto     Close contact  23  
Test place Copenhagen  Normal         699 
Test place Copenhagen  Close contact  23  
Test Place Thisted     Normal         159 
Test Place Thisted     Close contact  15  
;

data over under;
    set have;
    if Method = "Close contact" | Capacity >  400 then output over;
    else if Capacity <= 400 then output under;
run;

The DATA to DATA Step Macro
Blog: SASnrd

mmea · Posted 02-05-2021 07:21 AM

Hi

I did this approach

data over under;
    set test;
    if method= "close contact" | capacity>  400 then output over;
    else if capacity<= 400 then output under;
run;

But I still get some close contact places that has a capacity over 400 in the over dataset. But it should be really be in the under dataset, as the normal place for that close contact place has a capacity under 400?

So if i have this kindof data:

Test place	Method	capacity
test place toronto	Normal	23
test place toronto	close contact	455

then these to testplaces should be in the UNDER dataset as the normal is under 400.

how can I correct for that?

Angel_Larrion · Posted 02-02-2021 02:32 PM

This should work:

data have;
infile datalines dlm=",";
length Test_place $100 Method $20;
input Test_place $ method $ capacity;
datalines;
Test place Toronto,Normal,599
Test place Toronto,Close contact,23
Test place Copenhagen,Normal,699
Test place Copenhagen,Close contact,23
Test Place Thisted,Normal,159
Test Place Thisted,Close contact,15
;
run;


proc sql;
create table want1  as
select have.*, flag
from have as t1 left join (select test_place, 1 as flag
							from have
							where method="Normal" and capacity ge 400) as t2 

				on t1.test_place=t2.test_place;
quit;

data over under;
set want1;
if flag=1 then output over;
else output under;
drop flag;
run;

proc sql;
drop table want1;
run;

Angel_Larrion · Posted 02-02-2021 02:36 PM

The first proc sql creates a flag that indicates the test places that meet the condition (capacity of normal method=>400).

The data step uses this flag to create the two data sets.

mmea · Posted 02-02-2021 02:53 PM

Thank you so much for both of your answers they both worked - Thank you!

mmea · Posted 02-05-2021 10:06 AM

Hi again I revised my question if anyone can help me here

if I have a data set like this:

Test center	Capacity
Test center Toronto	460
Test center Toronto - close contact	80
Test center Melbourne	6566
Test center Melbourne - close contact	70
Test Center Copenhagen	666
Test Center Vaan	70
Test Center Thisted	152
Test Center Thisted - close contact	500
Test Center A	200
Test Center A - close contact	400

I want to make two new datasets out of this dataset (I have more observations in the real one)

The first dataset should contain all test center who has capacity over 400, but the matching "close contact" center should follow with, even though it can have below 400.

The second dataset should only contain test centers below 400 and the matching close contact center should follow with, even though the close contract can be over 400 (e.g. test center A and test center A close contact)

How can I do these statements?

Catch up on SAS Innovate 2026