DATA Step, Macro, Functions and more

Data set creation using existing data sets

Reply
N/A
Posts: 0

Data set creation using existing data sets

I am having trouble creating a dataset using information in 2 other data sets: intervals and accounts. Basically I want to break down data from the accounts dataset based on the intervals in the intervals dataset to create a 3rd dataset which has each account and it's interval name listed.

My intervals data set looks like this:

grpnm min1 max1 min2 max2
a1 0 50 0 50
a2 0 50 100 150
b1 50 100 200 300
...

My second data set looks like this:
acc# bal1 bal2
h1 20 20
h2 40 45
h3 75 250
h4 35 125
h5 80 201

My end goal is to produce a dataset like this:

grpnm acc# bal1 bal2
a1 h1 20 20
a1 h2 40 45
a2 h4 35 125
a3 h3 75 250
a3 h5 80 201

I can't use a join since the datasets have nothing in common. I was thinking of a loop but i wasn't sure how I would loop only on one dataset in a DATA step. Anybody have any ideas?

Thanks in advance!!!
N/A
Posts: 0

Re: Data set creation using existing data sets

I see two approaches to the problem:

1) For each data observation, test the intervals and output appropriately.
2) For each interval, test the data and output appropriately.

To implement #2, I see more than one way:
1) data dummy; set interval; call execute ... to call a macro
2) internal loop in the data step to step through the data

To implement #1, I think an internal loop in the data step to run through the interval conditions makes most sense.
N/A
Posts: 0

Re: Data set creation using existing data sets

Hi Chuck,

THanks! -- would it be possible for you to provide an example? I see the syntax for do-while loops but I am not sure how I would code the condition to loop until the end of the data set. So, I would have something like this right...

data dummy;
set accounts;
keep interval acc# bal1 bal2;
do until (last row of the intervals table)
if bal1 > min1 & bal1min2 & bal2
then interval = grpnm;
end;

In the above, how would I specify that I want to iterate on rows in the intervals dataset. how would I specify that when I meet the if condition, I should include the row in the new dummy dataset and use the value of grpnm from intervals at the particular row i stopped?

Thanks again.
Elena
N/A
Posts: 0

Re: Data set creation using existing data sets

The Proc SQL solution is better and more elegant. I hadn't thought through the problem enough to realize that the implicit cartesian join was actually desirable in this case.

To do the loop, you need to read about the set command and the dataset options.
You can use multiple sets in a data step, so you would place one inside the loop, and one outside the loop. The one inside the loop would have to have the appropriate options set to capture the end of the file (end=eof). I'm not sure how to reset back to the beginning. You may need to use random access, requesting the observation specifically by number.

But, I wouldn't waste any time on the looping methodology.
N/A
Posts: 0

Re: Data set creation using existing data sets

Real easy with PROC SQL.

Don't get stuck in a rut that you have to have something in common to join.

PROC PRINT DATA=DATA1;
RUN;

PROC PRINT DATA=DATA2;
RUN;

PROC SQL;
CREATE TABLE DATA3 AS
SELECT
GRPNM,
ACCT,
BAL1,
BAL2
FROM
DATA1,
DATA2
WHERE
BAL1 BETWEEN MIN1 AND MAX1
AND
BAL2 BETWEEN MIN2 AND MAX2;

PROC PRINT DATA=DATA3;
RUN;


Obs GRPNM MIN1 MAX1 MIN2 MAX2

1 A1 0 50 0 50
2 A2 0 50 100 150
3 A3 50 100 200 300

Obs ACCT BAL1 BAL2

1 H1 20 20
2 H2 40 45
3 H3 75 250
4 H4 35 125
5 H5 80 201

Obs GRPNM ACCT BAL1 BAL2

1 A1 H1 20 20
2 A1 H2 40 45
3 A3 H3 75 250
4 A2 H4 35 125
5 A3 H5 80 201


Ike Eisenhauer
N/A
Posts: 0

Re: Data set creation using existing data sets

Thanks Ike, that worked!

I'm still curious about how I would loop through two dataset with different types of data in it using Do-Until, but for now at least I can get some work done!
Ask a Question
Discussion stats
  • 5 replies
  • 152 views
  • 0 likes
  • 1 in conversation