BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Skillside
Calcite | Level 5

Hi there,

So in my dataset I have variable ID which is repeated (only in name of it) for each participant. So for example ID=A-0-0082, it is repeated in rows due to fact that participant changed a (variable)job. So one row is A-0-0082 with job X and second row the same participant again A-0-0082 with job X, the next three rows are for next participant with id B-0-0011 with jobs X,Y,Z etc. So what i want to do is to select only rows from id (A-W-0-10-0-9999) <--- format of ID (A-W is a range, same for first number 0-10 range, same with third "value" 0-9999, what creates for example B-2-0012) with for example job X note: that i want to read all the rest variables like age, gender etc in new datasets.

First time I have a such data and I don't know even how to start. Please, help me? is it even possible?

I can't paste you dataset or code, sorry for that.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@Skillside wrote:

Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.

Regards,

P.S. Sorry, I couldn't replay earlier.


So instead of all of the observation for those ids you just want some of them?

So you just want to filter based on the value of JOB?

data want;
  set have;
  where job='Y';
run;

 

View solution in original post

18 REPLIES 18
andreas_lds
Jade | Level 19

@Skillside wrote:

Hi there,

So in my dataset I have variable ID which is repeated (only in name of it) for each participant. So for example ID=A-0-0082, it is repeated in rows due to fact that participant changed a (variable)job. So one row is A-0-0082 with job X and second row the same participant again A-0-0082 with job X, the next three rows are for next participant with id B-0-0011 with jobs X,Y,Z etc. So what i want to do is to select only rows from id (A-W-0-10-0-9999) <--- format of ID (A-W is a range, same for first number 0-10 range, same with third "value" 0-9999, what creates for example B-2-0012) with for example job X note: that i want to read all the rest variables like age, gender etc in new datasets.

First time I have a such data and I don't know even how to start. Please, help me? is it even possible?

I can't paste you dataset or code, sorry for that.


 

If you can't post the original data, no problem, but then please make-up some data and post it in usable form. Without any data it is - at least for me - impossible to suggest something.

Skillside
Calcite | Level 5

Hope this will help:

 

ID                   JOB   AGE   SEX etc.

A-0-0082        X

A-0-0082        Y                                     <---want to select this for new dataset

A-0-0082        Z

A-0-0111        Y                                     <---want to select this for new dataset

A-0-0111        C

B-0-0012       D

B-0-0012       Y                                    

B-0-0012       Y                                       <---want to select this for new dataset (last one)

Cheers,

Please, note that in dataset there can be a duplicates, i would like to avoid that.

andreas_lds
Jade | Level 19

If the data can have duplicates, include those in your example data, so that suggest code can take that into account. And please note that "usable form" means a data step using datelines-statement, so that the data can be used without writing additional code and without guessing types and lengths.

Skillside
Calcite | Level 5

I have made it too complicated. I spoke with one of my friends about that, he said that the point of this analysis is to not lose observations. So i should do the loop procedure/option for ID duplicates. Can't imagine how does it work but i will try.

 

Tom
Super User Tom
Super User

@Skillside wrote:

I have made it too complicated. I spoke with one of my friends about that, he said that the point of this analysis is to not lose observations. So i should do the loop procedure/option for ID duplicates. Can't imagine how does it work but i will try.

 


Please explain what this means.  Is the point to just create a new variable that check if the ID ever had a value of Y?  Or perhaps a new variable that has the value of some other variable from the observation where the Y appeared?

 

data want ;
  input id $ jobcode $ any_y ;
cards;
1 X 0
1 Z 0
2 X 1
2 Y 1
;
Skillside
Calcite | Level 5

Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.

Regards,

P.S. Sorry, I couldn't replay earlier.

Tom
Super User Tom
Super User

@Skillside wrote:

Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.

Regards,

P.S. Sorry, I couldn't replay earlier.


So instead of all of the observation for those ids you just want some of them?

So you just want to filter based on the value of JOB?

data want;
  set have;
  where job='Y';
run;

 

Skillside
Calcite | Level 5

Yes, it seems that this is the easiest solution, thank You.

If it's possible, please close the topic.

Ksharp
Super User

Honestly I don't understand your Q very well.

 

data have;
input ID $       JOB   $ ;
cards;
A-0-0082        X
A-0-0082        Y                                     <---want to select this for new dataset
A-0-0082        Z
A-0-0111        Y                                     <---want to select this for new dataset
A-0-0111        C
B-0-0012       D
B-0-0012       Y                                    
B-0-0012       Y
;
run;

data temp;
 set have;
 by id job notsorted;
 group+first.job;
run;
data want;
 set temp;
 by group;
 if last.group and job='Y';
run;
Skillside
Calcite | Level 5

Dear Ksharp,

thank you for your answer.

isn't it that the command "if last.group and job='Y' will generate only information about last ID (from the same group of ID) with job y?

How to do loop by ID and what result will I recieve? or will 'loop'  leave only first observation or last observation from the same IDs?

Feeling honured with your response Ksharp.

Thanks again

 

 

Ksharp
Super User

""if last.group and job='Y' will generate only information about last ID (from the same group of ID) with job y?"

It will generate the last obs in each group and the last obs must be 'Y' .

 

"How to do loop by ID and what result will I recieve?"

Each ID is divided by several group , you can see it in my TEMP dataset.

Skillside
Calcite | Level 5

Thank you agian,

I would like to have different ID with the same JOB in the same group, can you show me how to do this, please?

Regards

Ksharp
Super User

"I would like to have different ID with the same JOB in the same group,"

What does the GROUP stands for(i.e.  what is your GROUP variable ) ?

And I think my code could get what you want. 

You could run my code on your real data, and see if there is any problem , and post the problem again .

 

Skillside
Calcite | Level 5

Hi, so now i am facing different problem. How to select "best" duplicate with the least lack of values in different variables in row?

 

data have;
input ID $ JOB $ age $ MS;
cards;
A-0-0082 X 25
A-0-0082 Y 25 yes <---want to select this for new dataset
A-0-0082 Z 25
A-0-0111 Y 36 no <---want to select this for new dataset
A-0-0111 C 36 no
B-0-0012 D 20 no
B-0-0012 Y yes
B-0-0012 Y 40 no <---want to select this for new dataset
;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 1039 views
  • 0 likes
  • 4 in conversation