Solved: Complicated selection of rows for new dataset

Skillside · Posted 07-12-2019 04:16 AM

Hi there,

So in my dataset I have variable ID which is repeated (only in name of it) for each participant. So for example ID=A-0-0082, it is repeated in rows due to fact that participant changed a (variable)job. So one row is A-0-0082 with job X and second row the same participant again A-0-0082 with job X, the next three rows are for next participant with id B-0-0011 with jobs X,Y,Z etc. So what i want to do is to select only rows from id (A-W-0-10-0-9999) <--- format of ID (A-W is a range, same for first number 0-10 range, same with third "value" 0-9999, what creates for example B-2-0012) with for example job X note: that i want to read all the rest variables like age, gender etc in new datasets.

First time I have a such data and I don't know even how to start. Please, help me? is it even possible?

I can't paste you dataset or code, sorry for that.

Tom · Posted 07-14-2019 02:49 PM

@Skillside wrote:

Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.

Regards,

P.S. Sorry, I couldn't replay earlier.

So instead of all of the observation for those ids you just want some of them?

So you just want to filter based on the value of JOB?

data want;
  set have;
  where job='Y';
run;

View solution in original post

andreas_lds · Posted 07-12-2019 04:36 AM

@Skillside wrote:

Hi there,

So in my dataset I have variable ID which is repeated (only in name of it) for each participant. So for example ID=A-0-0082, it is repeated in rows due to fact that participant changed a (variable)job. So one row is A-0-0082 with job X and second row the same participant again A-0-0082 with job X, the next three rows are for next participant with id B-0-0011 with jobs X,Y,Z etc. So what i want to do is to select only rows from id (A-W-0-10-0-9999) <--- format of ID (A-W is a range, same for first number 0-10 range, same with third "value" 0-9999, what creates for example B-2-0012) with for example job X note: that i want to read all the rest variables like age, gender etc in new datasets.

First time I have a such data and I don't know even how to start. Please, help me? is it even possible?

I can't paste you dataset or code, sorry for that.

If you can't post the original data, no problem, but then please make-up some data and post it in usable form. Without any data it is - at least for me - impossible to suggest something.

Skillside · Posted 07-12-2019 04:59 AM

Hope this will help:

ID JOB AGE SEX etc.

A-0-0082 X

A-0-0082 Y <---want to select this for new dataset

A-0-0082 Z

A-0-0111 Y <---want to select this for new dataset

A-0-0111 C

B-0-0012 D

B-0-0012 Y

B-0-0012 Y <---want to select this for new dataset (last one)

Cheers,

Please, note that in dataset there can be a duplicates, i would like to avoid that.

andreas_lds · Posted 07-12-2019 05:09 AM

If the data can have duplicates, include those in your example data, so that suggest code can take that into account. And please note that "usable form" means a data step using datelines-statement, so that the data can be used without writing additional code and without guessing types and lengths.

Skillside · Posted 07-12-2019 07:00 AM

I have made it too complicated. I spoke with one of my friends about that, he said that the point of this analysis is to not lose observations. So i should do the loop procedure/option for ID duplicates. Can't imagine how does it work but i will try.

Tom · Posted 07-12-2019 12:05 PM

@Skillside wrote:

I have made it too complicated. I spoke with one of my friends about that, he said that the point of this analysis is to not lose observations. So i should do the loop procedure/option for ID duplicates. Can't imagine how does it work but i will try.

Please explain what this means. Is the point to just create a new variable that check if the ID ever had a value of Y? Or perhaps a new variable that has the value of some other variable from the observation where the Y appeared?

data want ;
  input id $ jobcode $ any_y ;
cards;
1 X 0
1 Z 0
2 X 1
2 Y 1
;

Skillside · Posted 07-14-2019 02:13 PM

Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.

Regards,

P.S. Sorry, I couldn't replay earlier.

Tom · Posted 07-14-2019 02:49 PM

@Skillside wrote:

Hi, thank you for that question. I know that situation is a bit weird. I wanted to create new dataset with observations (whole row including the rest variables like sex, age etc.) where ID with JOB Y appears, that's all.

Regards,

P.S. Sorry, I couldn't replay earlier.

So instead of all of the observation for those ids you just want some of them?

So you just want to filter based on the value of JOB?

data want;
  set have;
  where job='Y';
run;

Skillside · Posted 07-14-2019 03:24 PM

Yes, it seems that this is the easiest solution, thank You.

If it's possible, please close the topic.

Ksharp · Posted 07-12-2019 08:00 AM

Honestly I don't understand your Q very well.

data have;
input ID $       JOB   $ ;
cards;
A-0-0082        X
A-0-0082        Y                                     <---want to select this for new dataset
A-0-0082        Z
A-0-0111        Y                                     <---want to select this for new dataset
A-0-0111        C
B-0-0012       D
B-0-0012       Y                                    
B-0-0012       Y
;
run;

data temp;
 set have;
 by id job notsorted;
 group+first.job;
run;
data want;
 set temp;
 by group;
 if last.group and job='Y';
run;

Skillside · Posted 07-12-2019 08:16 AM

Dear Ksharp,

thank you for your answer.

isn't it that the command "if last.group and job='Y' will generate only information about last ID (from the same group of ID) with job y?

How to do loop by ID and what result will I recieve? or will 'loop' leave only first observation or last observation from the same IDs?

Feeling honured with your response Ksharp.

Thanks again

Ksharp · Posted 07-12-2019 08:57 AM

""if last.group and job='Y' will generate only information about last ID (from the same group of ID) with job y?"

It will generate the last obs in each group and the last obs must be 'Y' .

"How to do loop by ID and what result will I recieve?"

Each ID is divided by several group , you can see it in my TEMP dataset.

Skillside · Posted 07-12-2019 09:30 AM

Thank you agian,

I would like to have different ID with the same JOB in the same group, can you show me how to do this, please?

Regards

Ksharp · Posted 07-12-2019 09:47 AM

"I would like to have different ID with the same JOB in the same group,"

What does the GROUP stands for(i.e. what is your GROUP variable ) ?

And I think my code could get what you want.

You could run my code on your real data, and see if there is any problem , and post the problem again .

Skillside · Posted 07-15-2019 04:01 AM

Hi, so now i am facing different problem. How to select "best" duplicate with the least lack of values in different variables in row?

data have;
input ID $ JOB $ age $ MS;
cards;
A-0-0082 X 25
A-0-0082 Y 25 yes <---want to select this for new dataset
A-0-0082 Z 25
A-0-0111 Y 36 no <---want to select this for new dataset
A-0-0111 C 36 no
B-0-0012 D 20 no
B-0-0012 Y yes
B-0-0012 Y 40 no <---want to select this for new dataset
;
run;

Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Re: Complicated selection of rows for new dataset

Ready to join fellow brilliant minds for the SAS Hackathon?