Hi there,
So in my dataset I have variable ID which is repeated (only in name of it) for each participant. So for example ID=A-0-0082, it is repeated in rows due to fact that participant changed a (variable)job. So one row is A-0-0082 with job X and second row the same participant again A-0-0082 with job X, the next three rows are for next participant with id B-0-0011 with jobs X,Y,Z etc. So what i want to do is to select only rows from id (A-W-0-10-0-9999) <--- format of ID (A-W is a range, same for first number 0-10 range, same with third "value" 0-9999, what creates for example B-2-0012) with for example job X note: that i want to read all the rest variables like age, gender etc in new datasets.
First time I have a such data and I don't know even how to start. Please, help me? is it even possible?
I can't paste you dataset or code, sorry for that.
@Skillside wrote:
Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.
Regards,
P.S. Sorry, I couldn't replay earlier.
So instead of all of the observation for those ids you just want some of them?
So you just want to filter based on the value of JOB?
data want;
set have;
where job='Y';
run;
@Skillside wrote:
Hi there,
So in my dataset I have variable ID which is repeated (only in name of it) for each participant. So for example ID=A-0-0082, it is repeated in rows due to fact that participant changed a (variable)job. So one row is A-0-0082 with job X and second row the same participant again A-0-0082 with job X, the next three rows are for next participant with id B-0-0011 with jobs X,Y,Z etc. So what i want to do is to select only rows from id (A-W-0-10-0-9999) <--- format of ID (A-W is a range, same for first number 0-10 range, same with third "value" 0-9999, what creates for example B-2-0012) with for example job X note: that i want to read all the rest variables like age, gender etc in new datasets.
First time I have a such data and I don't know even how to start. Please, help me? is it even possible?
I can't paste you dataset or code, sorry for that.
If you can't post the original data, no problem, but then please make-up some data and post it in usable form. Without any data it is - at least for me - impossible to suggest something.
Hope this will help:
ID JOB AGE SEX etc.
A-0-0082 X
A-0-0082 Y <---want to select this for new dataset
A-0-0082 Z
A-0-0111 Y <---want to select this for new dataset
A-0-0111 C
B-0-0012 D
B-0-0012 Y
B-0-0012 Y <---want to select this for new dataset (last one)
Cheers,
Please, note that in dataset there can be a duplicates, i would like to avoid that.
If the data can have duplicates, include those in your example data, so that suggest code can take that into account. And please note that "usable form" means a data step using datelines-statement, so that the data can be used without writing additional code and without guessing types and lengths.
I have made it too complicated. I spoke with one of my friends about that, he said that the point of this analysis is to not lose observations. So i should do the loop procedure/option for ID duplicates. Can't imagine how does it work but i will try.
@Skillside wrote:
I have made it too complicated. I spoke with one of my friends about that, he said that the point of this analysis is to not lose observations. So i should do the loop procedure/option for ID duplicates. Can't imagine how does it work but i will try.
Please explain what this means. Is the point to just create a new variable that check if the ID ever had a value of Y? Or perhaps a new variable that has the value of some other variable from the observation where the Y appeared?
data want ;
input id $ jobcode $ any_y ;
cards;
1 X 0
1 Z 0
2 X 1
2 Y 1
;
Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.
Regards,
P.S. Sorry, I couldn't replay earlier.
@Skillside wrote:
Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.
Regards,
P.S. Sorry, I couldn't replay earlier.
So instead of all of the observation for those ids you just want some of them?
So you just want to filter based on the value of JOB?
data want;
set have;
where job='Y';
run;
Yes, it seems that this is the easiest solution, thank You.
If it's possible, please close the topic.
Honestly I don't understand your Q very well.
data have;
input ID $ JOB $ ;
cards;
A-0-0082 X
A-0-0082 Y <---want to select this for new dataset
A-0-0082 Z
A-0-0111 Y <---want to select this for new dataset
A-0-0111 C
B-0-0012 D
B-0-0012 Y
B-0-0012 Y
;
run;
data temp;
set have;
by id job notsorted;
group+first.job;
run;
data want;
set temp;
by group;
if last.group and job='Y';
run;
Dear Ksharp,
thank you for your answer.
isn't it that the command "if last.group and job='Y' will generate only information about last ID (from the same group of ID) with job y?
How to do loop by ID and what result will I recieve? or will 'loop' leave only first observation or last observation from the same IDs?
Feeling honured with your response Ksharp.
Thanks again
""if last.group and job='Y' will generate only information about last ID (from the same group of ID) with job y?"
It will generate the last obs in each group and the last obs must be 'Y' .
"How to do loop by ID and what result will I recieve?"
Each ID is divided by several group , you can see it in my TEMP dataset.
Thank you agian,
I would like to have different ID with the same JOB in the same group, can you show me how to do this, please?
Regards
"I would like to have different ID with the same JOB in the same group,"
What does the GROUP stands for(i.e. what is your GROUP variable ) ?
And I think my code could get what you want.
You could run my code on your real data, and see if there is any problem , and post the problem again .
Hi, so now i am facing different problem. How to select "best" duplicate with the least lack of values in different variables in row?
data have;
input ID $ JOB $ age $ MS;
cards;
A-0-0082 X 25
A-0-0082 Y 25 yes <---want to select this for new dataset
A-0-0082 Z 25
A-0-0111 Y 36 no <---want to select this for new dataset
A-0-0111 C 36 no
B-0-0012 D 20 no
B-0-0012 Y yes
B-0-0012 Y 40 no <---want to select this for new dataset
;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.