Re: How to loop through observations for the same id and same date usi...

Kc2 · Posted 01-20-2015 11:55 AM

Hello ,

I am trying to loop through multiple rows for a same ID and date in order to create a derived variable.

My dataset is a follows:

ID DATE SUBJECT DERIVED VARIABLE
A Jan1 2015 Chemistry

Physics

Biology

A Jan 10 2015 Swimming

Track

Math

A Jan20 2015 History

Social Science

English

B Jan1 2015 Swimming

Track

Math

B Jan20 2015 History

Social Science

English

etc...

I want to assign a the value of the derived variable based on the all the records for one subject one date.

ID DATE SUBJECT DERIVED VARIABLE
A Jan1 2015 Chemistry

Physics Science

Biology

A Jan 10 2015 Swimming

Track Sports

Math

A Jan20 2015 History

Social Science Humanities

English

B Jan1 2015 Swimming

Track Sports

Math

B Jan20 2015 History

Social Science

English

How can I loop through all the records for one particular subject for each date and then assign the derived value?

The program should check all observation for the same id and day and if the records have "physics,chemistry, biology" assign the value Science. If the records have "Simming, track" assign the value Sport.

I did a sort by id and date and then tried with first.date but only the very first observation is taken into account.

thanks

KC

ballardw · Posted 01-20-2015 12:13 PM

Is you subject value one variable that contains 3 values?

What happens with Math when associated with Swimming and Track?

Kc2 · Posted 01-20-2015 12:22 PM

The dataset is one row per observation so the subject variable is on each row. I presented the way I did for legibility purpose. This is how it looks.

ID DATE SUBJECT DERIVED VARIABLE
A Jan1 2015 Chemistry

A Jan1 2015 Physics

A Jan1 2015 Biology

Regarding Math, becasue two out or three fall inthe same category math is ignored.

If all three were different for example:

A Jan 10 2015 Swimming

A Jan 10 2015 English

A Jan 10 2015 Math

then the derived variabel would be empty.

Reeza · Posted 01-20-2015 12:51 PM

I don't really see a way other than if/then statements with a combination of by group processing.

Here's a partial way. You'll need to then look at the final counts on last.id and determine what the derived variable is.

data want;

set have;

by id;

array subject(3) science sport humanities;

retain science sport humanities;

if first.id then call missing(science, sport, humanities);

if subject="Chemistry" then science+1;

else if subject="Swimming" then sport+1;

etc...

run;

Astounding · Posted 01-20-2015 12:47 PM

Looping through the data is not difficult ... but you will need to loop through it twice. The first loop lets you assign your derived variable. The second loop lets you read/output the same records with the derived variable attached. For example:

data want;

science = 0;

humanities = 0;

sports = 0;

do until (last.date);

set have;

by id date;

if subject in ('Chemistry', 'Physics', 'Biology') then science + 1;

else if subject in ('Swimming', 'Track') then sports + 1;

else if subject in ('English', 'Social Science', 'History') then humanities + 1;

end;

*** Add your logic here to create DERIVED_VARIABLE based on SCIENCE, HUMANITIES and SPORTS;

do until (last.date);

set have;

by id date;

output;

end;

**** optionally, drop science sports humanities;

run;

You may have to account for additional SUBJECT values, and you may have to deal with variations in spelling as well.

The logic between the loops is up to you ... how to handle ties, and various mixes. But the data that you would need is all there.

Good luck.

Kc2 · Posted 01-20-2015 02:05 PM

None of the solution above seems to work.

I need the loop to go through all the observations per id and date and then assign the derived variable values.

I do not know which derived values to assign at the start it dpends on what the program will find in the dataset.
kc

Reeza · Posted 01-20-2015 02:09 PM

None of the above are complete solutions. What code did you use that "didn't work"?

Astounding · Posted 01-20-2015 02:10 PM

You will need to do some of the work, if you want to use my solution. There are too many rules that I cannot possibly know, and don't even want to ask. For example, what should happen if a student has 3 science courses and 1 sports? What if it is 2 of each? Instead of trying to answer those types of questions, I came up with 3 variables:

SCIENCE = # of science courses

HUMANITIES = # of humanities courses

SPORTS = # of SPORTS courses

It is up to you to make up the rules and write the code that creates DERIVED_VALUE based on these variables. I'm not going to ask for the rules for all the possible variations that might be found. I indicated in the program the spot where you would have to add your logic.

Good luck.

PGStats · Posted 01-20-2015 02:40 PM

I suggest that you define a small dataset to specify which subjects are in which disciplines and then use a majority rule to set your derived variable:

data disciplines;

length SUBJECT Discipline $20;

input SUBJECT & Discipline ;

datalines;

Chemistry Science

History Humanities

Math Science

Track Sports

Biology Science

English Humanities

Physics Science

Social Science Humanities

Track Sports

Swimming Sports

;

data myDataset;

length ID DATE SUBJECT $20;

input ID DATE SUBJECT &;

datalines;

A Jan012015 Chemistry

A Jan012015 Physics

A Jan012015 Biology

A Jan102015 Swimming

A Jan102015 Track

A Jan102015 Math

A Jan202015 History

A Jan202015 Social Science

A Jan202015 English

B Jan012015 Swimming

B Jan012015 Track

B Jan012015 Math

B Jan202015 History

B Jan202015 Social Science

B Jan202015 English

C Jan302015 English

C Jan302015 Swimming

C Jan302015 Math

;

proc sql;

create table counts as

select ID, date, discipline, count(*) as n

from

myDataset as a inner join

disciplines as b on a.subject=b.subject

group by ID, date, discipline

having n > 1

order by ID, date, n descending;

quit;

data counts;

set counts; by ID date;

if first.date;

run;

proc sql;

create table want as

select a.*, b.discipline

from

myDataset as a left join

counts as b on a.ID=b.ID and a.date=b.date;

select * from want;

quit;

PG

Kc2 · Posted 01-20-2015 02:18 PM

I used "Astounding's" code". In the first loop,

do until (last.date);

set have;

by id date;

if subject in ('Chemistry', 'Physics', 'Biology') then science + 1;

else if subject in ('Swimming', 'Track') then sports + 1;

else if subject in ('English', 'Social Science', 'History') then humanities + 1;

end;

if takes only the first values and assign all the other flags to 0.

Thank you for you feedabck Reeza and Astounding. I will try to work with the hints you have provided.

KC

Astounding · Posted 01-20-2015 02:34 PM

In theory, that shouldn't happen. If it does happen, it reflects what is in the data. For example:

(1) There might be only one SUBJECT for an ID/DATE combination.

(2) The spelling of SUBJECT might be slightly different from the values being counted.

(3) The list of values being counted might be incomplete.

But it would be right to look at the data, not the programming logic. This approach should definitely generate better counts. Perhaps start with a PROC FREQ on SUBJECT, so you are certain of what is there.

Good luck.

How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Re: How to loop through observations for the same id and same date using a do loop in a data step?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away