DATA Step, Macro, Functions and more

creating new rows

Reply
Frequent Contributor
Posts: 91

creating new rows

I’m switching my data source from an old matrix dataset over to columnar data from an Oracle table, but need some help on generating a new variable based on a series of conditions.

In the old system:

STATION DATETIME TKNT NOXT NOXD
ABC 200903251200 1.4 0.5 0.1
DEF 200903251300 2.0 0.4 0.02
GHI 200903251400 . 0.6 0.1
JKL 200903251500 1.9 . 0.2

Using this code in a DATA step

DATA NEWMATRIX ; SET MATRIXDATA ;
…stuff…
IF TN = . THEN DO ;
IF TKNT EQ . THEN TN = . ;
ELSE IF NOXD NE . THEN TN = TKNT + NOXD ;
ELSE IF NOXT NE . THEN TN = TKNT + NOXT ;
ELSE TN = TKNT ;
END ;
…more stuff…
RUN ;

Gives this:

STATION DATETIME TKNT NOXT NOXD TN
ABC 200903251200 1.4 0.5 0.1 1.5
DEF 200903251300 2.0 0.3 . 2.3
GHI 200903251400 . 0.6 0.1 .
JKL 200903251500 1.3 . . 1.3

Now, my data looks like this:

STATION DATETIME PARAMETER VALUE
ABC 200903251200 TKNT 1.4
ABC 200903251200 NOXT 0.5
ABC 200903251200 NOXD 0.1
DEF 200903251300 TKNT 2.0
DEF 200903251300 NOXT 0.3
GHI 200903251400 NOXT 0.6
…etc…

So how do I get here?

DATA NEWCOLUMN ; SET COLUMNDATA ;
Then what???

STATION DATETIME PARAMETER VALUE
ABC 200903251200 TN 1.5
DEF 200903251300 TN 2.3
JKL 200903251500 TN 1.3

Thanks so much for any references or help you can give me!

Wendy
Super User
Posts: 5,424

Re: creating new rows

I don't really follow you all the way. Which table is your desired final result?

It seems that you want to transpose the data in some way. You can accomplish this by using multiple output statements, after you have assigned the new columns their appropriate values. You might also have look into PROC TRANSPOSE.

Regards,
Linus
Data never sleeps
Frequent Contributor
Posts: 91

Re: creating new rows

Linus - I’m sorry that I didn’t make myself clear in the original post. I'm looking for a way to do the same calculation without a transposition step.

The point I was trying to make is that my data used to be in matrix format, and to make a new parameter (TN), it was quite simple to combine existing parameters when they were all in separate columns with a unique ID for each dataline.

STATION DATETIME TKNT NOXT NOXD

Thus, to make the new column of TN, I used

IF TN = . THEN DO ;
IF TKNT EQ . THEN TN = . ;
ELSE IF NOXD NE . THEN TN = TKNT + NOXD ;
ELSE IF NOXT NE . THEN TN = TKNT + NOXT ;
ELSE TN = TKNT ;
END ;

My old matrix dataset no longer exists, and my data now exists in columnar format, so I have multiple rows per id, one parameter per line. So the values TKNT, NOXT, & NOXD now are contained in the parameter column, and the numeric value is contained in the column VALUE.

STATION DATETIME PARAMETER VALUE

My question is: how do I perform the same calculation from above without pulling out the relevant rows and flipping them to matrix format first?

The main dataset currently runs about 1.1 million records, so it would be really nice to do this directly.

Thanks for any help you can give me!

Wendy
Super User
Posts: 5,424

Re: creating new rows

I think you still need some kind of transposing, but it can be don in one step - if your data is already sorted by your id column (Station). Something like this :

data tn;
set notn;
by station;
retain tknt noxt noxd;
if first.station then do;
tknt = .;
noxt = .;
noxd = .;
end;
select(parameter);
when ('TKNT') tknt = value;
when ('NOXT') noxt = value;
when ('NOXD') noxd = value;otherwise;
end;
if last.station then do;
/* your IF block goes here */
output;
end;
run;

I think you can call this doing it directly?

/Linus
Data never sleeps
Frequent Contributor
Posts: 91

Re: creating new rows

Linus-

Thanks so much for your help!

Your code worked beautifully, and I was working on cleaning up the output and how to get the rest of the unique identifiers from the original dataset.

While I was in that process, one of our lab guys came up with a subquery in SQL that did the trick while keeping everything in place, and I thought you might like to see it.

Wendy

Here it is: (&UNIVARS is the list of identifers that generates uniqueness)

PROC SQL NOPRINT ;
CREATE TABLE TN AS
SELECT
&UNIVARS,
SUM(VALUE) AS VALUE,
FROM EDDATA.IRLDATA
WHERE PARAMETER IN('TKNT','NOXD') AND VALUE NE .
AND EXISTS
(SELECT VALUE FROM EDDATA.IRLDATA WHERE IRLDATA.PARAMETER='NOXD')
OR IRLDATA.PARAMETER IN('TKNT','NOXT')
AND NOT EXISTS
(SELECT VALUE FROM EDDATA.IRLDATA WHERE IRLDATA.PARAMETER = 'NOXD')
GROUP BY &UNIVARS
HAVING IRLDATA.PARAMETER = 'TKNT' AND VALUE NE . ;
Ask a Question
Discussion stats
  • 4 replies
  • 135 views
  • 0 likes
  • 2 in conversation