BookmarkSubscribeRSS Feed
Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Hello ,

I am trying to loop through multiple rows for a same ID and date  in order to create a derived variable.

My dataset is a follows:

ID          DATE                        SUBJECT               DERIVED VARIABLE
A          Jan1 2015                    Chemistry

                                               Physics

                                               Biology

A          Jan 10 2015                 Swimming 

                                               Track

                                               Math

A          Jan20 2015                  History

                                               Social Science

                                               English

B        Jan1 2015                      Swimming

                                                Track

                                                Math

B         Jan20 2015                  History

                                               Social Science

                                               English

etc...

I want to assign a the value of the derived variable based on the all the records for one subject  one date.

ID          DATE                        SUBJECT               DERIVED VARIABLE
A          Jan1 2015                    Chemistry

                                               Physics                         Science

                                               Biology

A          Jan 10 2015                 Swimming 

                                               Track                            Sports

                                               Math

A          Jan20 2015                  History              

                                               Social Science                 Humanities

                                               English

B        Jan1 2015                      Swimming

                                                Track                            Sports

                                                Math

B         Jan20 2015                  History

                                               Social Science

                                               English

How can I loop through all the records for one particular subject  for each date and then assign the derived value?

The program should check all observation for the same id and day and if the records have "physics,chemistry, biology" assign the value Science. If the records have "Simming, track" assign the value Sport.

I did a  sort by id and date and then tried with first.date but only the very first observation is taken into account.

thanks

KC

10 REPLIES 10
ballardw
Super User

Is you subject value one variable that contains 3 values?

What happens with Math when associated with Swimming and Track?

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

The dataset is one row per observation so the subject variable is  on each row. I presented the way I did for legibility purpose. This is how it looks.

ID          DATE                        SUBJECT               DERIVED VARIABLE
A          Jan1 2015                   Chemistry

A          Jan1 2015                   Physics

A          Jan1 2015                   Biology

Regarding Math, becasue two out or three fall inthe same category math is ignored.

If all three were different for example:

A          Jan 10 2015                 Swimming 

A          Jan 10 2015                 English                          

A          Jan 10 2015                 Math

then the derived variabel would be empty.

Reeza
Super User

I don't really see a way other than if/then statements with a combination of by group processing.

Here's a partial way. You'll need to then look at the final counts on last.id and determine what the derived variable is.

data want;

set have;

by id;

array subject(3) science sport humanities;

retain science sport humanities;

if first.id then call missing(science, sport, humanities);

if subject="Chemistry" then science+1;

else if subject="Swimming" then sport+1;

etc...

run;

Astounding
PROC Star

Looping through the data is not difficult ... but you will need to loop through it twice.  The first loop lets you assign your derived variable.  The second loop lets you read/output the same records with the derived variable attached.  For example:

data want;

   science = 0;

   humanities = 0;

   sports = 0;

   do until (last.date);

      set have;

      by id date;

      if subject in ('Chemistry', 'Physics', 'Biology') then science + 1;

      else if subject in ('Swimming', 'Track') then sports + 1;

      else if subject in ('English', 'Social Science', 'History') then humanities + 1;

   end;

   *** Add your logic here to create DERIVED_VARIABLE based on SCIENCE, HUMANITIES and SPORTS;

   do until (last.date);

      set have;

      by id date;

      output;

   end;

   **** optionally, drop science sports humanities;

run;

You may have to account for additional SUBJECT values, and you may have to deal with variations in spelling as well.

The logic between the loops is up to you ... how to handle ties, and various mixes.  But the data that you would need is all there.

Good luck.

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

None of the solution above seems to work.

I need the loop to go through all the observations per id and date and then assign the derived variable values.

I do not know which derived values to assign at the start it dpends on what the program will find in the  dataset.
kc

Reeza
Super User

None of the above are complete solutions. What code did you use that "didn't work"?

Astounding
PROC Star

You will need to do some of the work, if you want to use my solution.  There are too many rules that I cannot possibly know, and don't even want to ask.  For example, what should happen if a student has 3 science courses and 1 sports?  What if it is 2 of each?  Instead of trying to answer those types of questions, I came up with 3 variables:

SCIENCE = # of science courses

HUMANITIES = # of humanities courses

SPORTS = # of SPORTS courses

It is up to you to make up the rules and write the code that creates DERIVED_VALUE based on these variables.  I'm not going to ask for the rules for all the possible variations that might be found.  I indicated in the program the spot where you would have to add your logic.

Good luck.

PGStats
Opal | Level 21

I suggest that you define a small dataset to specify which subjects are in which disciplines and then use a majority rule to set your derived variable:

data disciplines;

length SUBJECT  Discipline $20;

input SUBJECT & Discipline ;

datalines;

Chemistry   Science

History   Humanities

Math   Science

Track   Sports

Biology   Science

English   Humanities

Physics   Science

Social Science   Humanities

Track   Sports

Swimming   Sports

;

data myDataset;

length ID DATE SUBJECT $20;

input ID DATE SUBJECT &;

datalines;

A Jan012015 Chemistry

A Jan012015 Physics

A Jan012015 Biology

A Jan102015 Swimming

A Jan102015 Track

A Jan102015 Math

A Jan202015 History

A Jan202015 Social Science

A Jan202015 English

B Jan012015 Swimming

B Jan012015 Track

B Jan012015 Math

B Jan202015 History

B Jan202015 Social Science

B Jan202015 English

C Jan302015 English

C Jan302015 Swimming

C Jan302015 Math

;

proc sql;

create table counts as

select ID, date, discipline, count(*) as n

from

    myDataset as a inner join

    disciplines as b on a.subject=b.subject

group by ID, date, discipline

having n > 1

order by ID, date, n descending;

quit;

data counts;

set counts; by ID date;

if first.date;

run;

proc sql;

create table want as

select a.*, b.discipline

from

    myDataset as a left join

    counts as b on a.ID=b.ID and a.date=b.date;

select * from want;

quit;

PG

PG
Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

I used "Astounding's" code". In the first loop,

   do until (last.date);

      set have;

      by id date;

      if subject in ('Chemistry', 'Physics', 'Biology') then science + 1;

      else if subject in ('Swimming', 'Track') then sports + 1;

      else if subject in ('English', 'Social Science', 'History') then humanities + 1;

   end;

if takes only the first values and assign all the other flags to 0.

Thank you for you feedabck Reeza and Astounding. I will try to work with the hints you have provided.

KC

Astounding
PROC Star

In theory, that shouldn't happen.  If it does happen, it reflects what is in the data.  For example:

(1) There might be only one SUBJECT for an ID/DATE combination.

(2) The spelling of SUBJECT might be slightly different from the values being counted.

(3) The list of values being counted might be incomplete.

But it would be right to look at the data, not the programming logic.  This approach should definitely generate better counts.  Perhaps start with a PROC FREQ on SUBJECT, so you are certain of what is there.

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 16691 views
  • 3 likes
  • 5 in conversation