BookmarkSubscribeRSS Feed
WendyT
Pyrite | Level 9
I’m switching my data source from an old matrix dataset over to columnar data from an Oracle table, but need some help on generating a new variable based on a series of conditions.

In the old system:

STATION DATETIME TKNT NOXT NOXD
ABC 200903251200 1.4 0.5 0.1
DEF 200903251300 2.0 0.4 0.02
GHI 200903251400 . 0.6 0.1
JKL 200903251500 1.9 . 0.2

Using this code in a DATA step

DATA NEWMATRIX ; SET MATRIXDATA ;
…stuff…
IF TN = . THEN DO ;
IF TKNT EQ . THEN TN = . ;
ELSE IF NOXD NE . THEN TN = TKNT + NOXD ;
ELSE IF NOXT NE . THEN TN = TKNT + NOXT ;
ELSE TN = TKNT ;
END ;
…more stuff…
RUN ;

Gives this:

STATION DATETIME TKNT NOXT NOXD TN
ABC 200903251200 1.4 0.5 0.1 1.5
DEF 200903251300 2.0 0.3 . 2.3
GHI 200903251400 . 0.6 0.1 .
JKL 200903251500 1.3 . . 1.3

Now, my data looks like this:

STATION DATETIME PARAMETER VALUE
ABC 200903251200 TKNT 1.4
ABC 200903251200 NOXT 0.5
ABC 200903251200 NOXD 0.1
DEF 200903251300 TKNT 2.0
DEF 200903251300 NOXT 0.3
GHI 200903251400 NOXT 0.6
…etc…

So how do I get here?

DATA NEWCOLUMN ; SET COLUMNDATA ;
Then what???

STATION DATETIME PARAMETER VALUE
ABC 200903251200 TN 1.5
DEF 200903251300 TN 2.3
JKL 200903251500 TN 1.3

Thanks so much for any references or help you can give me!

Wendy
4 REPLIES 4
LinusH
Tourmaline | Level 20
I don't really follow you all the way. Which table is your desired final result?

It seems that you want to transpose the data in some way. You can accomplish this by using multiple output statements, after you have assigned the new columns their appropriate values. You might also have look into PROC TRANSPOSE.

Regards,
Linus
Data never sleeps
WendyT
Pyrite | Level 9
Linus - I’m sorry that I didn’t make myself clear in the original post. I'm looking for a way to do the same calculation without a transposition step.

The point I was trying to make is that my data used to be in matrix format, and to make a new parameter (TN), it was quite simple to combine existing parameters when they were all in separate columns with a unique ID for each dataline.

STATION DATETIME TKNT NOXT NOXD

Thus, to make the new column of TN, I used

IF TN = . THEN DO ;
IF TKNT EQ . THEN TN = . ;
ELSE IF NOXD NE . THEN TN = TKNT + NOXD ;
ELSE IF NOXT NE . THEN TN = TKNT + NOXT ;
ELSE TN = TKNT ;
END ;

My old matrix dataset no longer exists, and my data now exists in columnar format, so I have multiple rows per id, one parameter per line. So the values TKNT, NOXT, & NOXD now are contained in the parameter column, and the numeric value is contained in the column VALUE.

STATION DATETIME PARAMETER VALUE

My question is: how do I perform the same calculation from above without pulling out the relevant rows and flipping them to matrix format first?

The main dataset currently runs about 1.1 million records, so it would be really nice to do this directly.

Thanks for any help you can give me!

Wendy
LinusH
Tourmaline | Level 20
I think you still need some kind of transposing, but it can be don in one step - if your data is already sorted by your id column (Station). Something like this :

data tn;
set notn;
by station;
retain tknt noxt noxd;
if first.station then do;
tknt = .;
noxt = .;
noxd = .;
end;
select(parameter);
when ('TKNT') tknt = value;
when ('NOXT') noxt = value;
when ('NOXD') noxd = value;otherwise;
end;
if last.station then do;
/* your IF block goes here */
output;
end;
run;

I think you can call this doing it directly?

/Linus
Data never sleeps
WendyT
Pyrite | Level 9
Linus-

Thanks so much for your help!

Your code worked beautifully, and I was working on cleaning up the output and how to get the rest of the unique identifiers from the original dataset.

While I was in that process, one of our lab guys came up with a subquery in SQL that did the trick while keeping everything in place, and I thought you might like to see it.

Wendy

Here it is: (&UNIVARS is the list of identifers that generates uniqueness)

PROC SQL NOPRINT ;
CREATE TABLE TN AS
SELECT
&UNIVARS,
SUM(VALUE) AS VALUE,
FROM EDDATA.IRLDATA
WHERE PARAMETER IN('TKNT','NOXD') AND VALUE NE .
AND EXISTS
(SELECT VALUE FROM EDDATA.IRLDATA WHERE IRLDATA.PARAMETER='NOXD')
OR IRLDATA.PARAMETER IN('TKNT','NOXT')
AND NOT EXISTS
(SELECT VALUE FROM EDDATA.IRLDATA WHERE IRLDATA.PARAMETER = 'NOXD')
GROUP BY &UNIVARS
HAVING IRLDATA.PARAMETER = 'TKNT' AND VALUE NE . ;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 705 views
  • 0 likes
  • 2 in conversation