This is raw data ‘Tour’
ID | City | Country | Tour |
A | Yes | Yes |
|
A | No | No |
|
A | Yes | Yes |
|
B | No | No |
|
B | Yes | Yes |
|
C | No | No |
|
D | Yes | Yes |
|
I would like to make my codes as following:
1. If people who had ID had visited city & country, I would like to check Tour=’Yes’ (I have to count Tour=‘Yes’)
2. I would like to delete the observation of which Tour = No if people have a record of ‘Tour=Yes’
3. If people had not visited any city & country, I would like to leave it as ‘Tour=No’
The following is what I want to get
ID | City | Country | Tour |
A | Yes | Yes | Yes |
A | Yes | Yes | Yes |
B | Yes | Yes | Yes |
C | No | No | No |
D | Yes | Yes | Yes |
I made my codes as follows
Proc sort data= work.Tour;
by ID;
Run;
Data work.Tour_I_want;
set work.Tour;
by ID;
if first.ID and last.ID then do;
if City = ‘Yes’ and Country = ‘Yes’ then Tour = 'Yes';
else Tour = 'No';
end;
else if last.ID=0 then do;
if City = ‘Yes’ and Country = ‘Yes’ then do;
Tour = 'Yes'; retain Tour; end;
else delete;
end;
else if last.ID then do;
if Tour = '1' then delete;
else if City = ‘Yes’ and Country = ‘Yes’ then Tour = '1';
else Tour = '0';
end;
Run;
This seems not efficient. I repeated the condition(City=’Yes’ and Country=’Yes’) three times.
Could you tell me the better way for this? Thank you in advance.
* I am using SAS university.
Next code gives the wanted result:
data have;
infile datalines;
input id $ city $ country $;
datalines;
A Yes Yes
A No No
A Yes Yes
B No No
B Yes Yes
C No No
D Yes Yes
; run;
data want;
set have;
by ID;
length Tour $3;
if city="Yes" and country="Yes"
then Tour="Yes";
else Tour="No";
if first.ID and last.ID
or Tour="Yes" then output;
run;
Unless I am mistaking your output, then a simple where clause can get you that output - note how I put test data as a datastep - this makes it easier for us to see your structure:
data have; input ID $ City $ Country $; datalines; A Yes Yes A No No A Yes Yes ; run; data want; set have (where=(city="Yes" and country="Yes")); tour="Yes"; run;
Thank you for your prompt answer.
I think I didn't explain what I want exactly.
To add some explanation,
At first, I would like to select by ID.
If the ID has Tour = 'Yes'(s), it should be selected regardless of repeated number. (So, there could be 2 observation for A)
If the ID doesn't have Tour = 'Yes', it should be presented as Tour = 'No' (I am not selecting Tour='Yes' only)
If the observation has Tour='Yes' and Tour='No', I just need Tour = 'Yes' only.
Briefly speaking,
At least, per ID, there should be Tour = 'Yes' or Tour = 'No'. If it has Tour = 'Yes', I don't want the observation with Tour ='No'.
And, I should check the number of Tour ='Yes', so the observation with Tour='Yes' should be selected.
I appreciate your help.
What if City=No and Country=Yes? What should happen then?
Next code gives the wanted result:
data have;
infile datalines;
input id $ city $ country $;
datalines;
A Yes Yes
A No No
A Yes Yes
B No No
B Yes Yes
C No No
D Yes Yes
; run;
data want;
set have;
by ID;
length Tour $3;
if city="Yes" and country="Yes"
then Tour="Yes";
else Tour="No";
if first.ID and last.ID
or Tour="Yes" then output;
run;
Thank you so much.
It is great!. It works perfectly. Very efficient!
Sorry. it was unneccessary two variables. I could made it one.
Let's say that there is only (City=Yes and Country=Yes) or (City=No and Country = No)
@Plessein wrote:
Sorry. it was unneccessary two variables. I could made it one.
Let's say that there is only (City=Yes and Country=Yes) or (City=No and Country = No)
And do what?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.