This is raw data ‘Tour’
ID | City | Country | Tour |
A | Yes | Yes |
|
A | No | No |
|
A | Yes | Yes |
|
B | No | No |
|
B | Yes | Yes |
|
C | No | No |
|
D | Yes | Yes |
|
I would like to make my codes as following:
1. If people who had ID had visited city & country, I would like to check Tour=’Yes’ (I have to count Tour=‘Yes’)
2. I would like to delete the observation of which Tour = No if people have a record of ‘Tour=Yes’
3. If people had not visited any city & country, I would like to leave it as ‘Tour=No’
The following is what I want to get
ID | City | Country | Tour |
A | Yes | Yes | Yes |
A | Yes | Yes | Yes |
B | Yes | Yes | Yes |
C | No | No | No |
D | Yes | Yes | Yes |
I made my codes as follows
Proc sort data= work.Tour;
by ID;
Run;
Data work.Tour_I_want;
set work.Tour;
by ID;
if first.ID and last.ID then do;
if City = ‘Yes’ and Country = ‘Yes’ then Tour = 'Yes';
else Tour = 'No';
end;
else if last.ID=0 then do;
if City = ‘Yes’ and Country = ‘Yes’ then do;
Tour = 'Yes'; retain Tour; end;
else delete;
end;
else if last.ID then do;
if Tour = '1' then delete;
else if City = ‘Yes’ and Country = ‘Yes’ then Tour = '1';
else Tour = '0';
end;
Run;
This seems not efficient. I repeated the condition(City=’Yes’ and Country=’Yes’) three times.
Could you tell me the better way for this? Thank you in advance.
* I am using SAS university.
Next code gives the wanted result:
data have;
infile datalines;
input id $ city $ country $;
datalines;
A Yes Yes
A No No
A Yes Yes
B No No
B Yes Yes
C No No
D Yes Yes
; run;
data want;
set have;
by ID;
length Tour $3;
if city="Yes" and country="Yes"
then Tour="Yes";
else Tour="No";
if first.ID and last.ID
or Tour="Yes" then output;
run;
Unless I am mistaking your output, then a simple where clause can get you that output - note how I put test data as a datastep - this makes it easier for us to see your structure:
data have; input ID $ City $ Country $; datalines; A Yes Yes A No No A Yes Yes ; run; data want; set have (where=(city="Yes" and country="Yes")); tour="Yes"; run;
Thank you for your prompt answer.
I think I didn't explain what I want exactly.
To add some explanation,
At first, I would like to select by ID.
If the ID has Tour = 'Yes'(s), it should be selected regardless of repeated number. (So, there could be 2 observation for A)
If the ID doesn't have Tour = 'Yes', it should be presented as Tour = 'No' (I am not selecting Tour='Yes' only)
If the observation has Tour='Yes' and Tour='No', I just need Tour = 'Yes' only.
Briefly speaking,
At least, per ID, there should be Tour = 'Yes' or Tour = 'No'. If it has Tour = 'Yes', I don't want the observation with Tour ='No'.
And, I should check the number of Tour ='Yes', so the observation with Tour='Yes' should be selected.
I appreciate your help.
What if City=No and Country=Yes? What should happen then?
Next code gives the wanted result:
data have;
infile datalines;
input id $ city $ country $;
datalines;
A Yes Yes
A No No
A Yes Yes
B No No
B Yes Yes
C No No
D Yes Yes
; run;
data want;
set have;
by ID;
length Tour $3;
if city="Yes" and country="Yes"
then Tour="Yes";
else Tour="No";
if first.ID and last.ID
or Tour="Yes" then output;
run;
Thank you so much.
It is great!. It works perfectly. Very efficient!
Sorry. it was unneccessary two variables. I could made it one.
Let's say that there is only (City=Yes and Country=Yes) or (City=No and Country = No)
@Plessein wrote:
Sorry. it was unneccessary two variables. I could made it one.
Let's say that there is only (City=Yes and Country=Yes) or (City=No and Country = No)
And do what?
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.