Quartz | Level 8

## Dataset Transformation Help Wanted

I need some help getting a dataset transformed.  The goal is to take dataset HAVE and turn it into dataset WANT. See below for details.  I don't know how to accomplish this task.  Thanks in advance for your help.

Gene

data have;
input Target \$ Tx Ty Station \$ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
;
run;
proc sort data=have;
by target;
run;

/*Need code here (Proc Transpose?) that produces the following dataset from dataset Have
where ksum is the sum of k for each unique Target value*/

Data want;
input Target \$ Tx Ty ksum S1 \$ S2 \$ S3 \$;
datalines;
0/0 0 0 1 A . .
0/1 0 1 2 A B .
1/0 1 0 3 A B C
;
run;

/* The goal is to  create a scatter plot of Tx by Ty grouped by ksum with datatips
that display Target,ksum,S1,S2,S3.  See below*/

ods graphics on/imagemap;
Proc SGPlot data=want;
scatter x=tx y=ty/group=ksum tip=(Target ksum S1 S2 S3);
xaxis min=-2 max=2;
yaxis min=-2 max=2;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Dataset Transformation Help Wanted

Can you indicate exactly which rows/values you are summing to get the output data set?

From what little I understand there is more than one way to get several of those "sums".

This seems to work for your example data.

I'm not sure how extensible it might be.

```data want;
set have;
array s(3) \$;
retain ksum s1-s3 counter;
by target;
if first.target then do;
call missing(ksum, of s(*),counter);

end;
if k then do;
counter+1;
ksum+k;
s[counter] = station;
end;
if last.target then output;
drop station k counter;
run;

```

You might want to use either a custom format to display something like N/A for the S variables when the value is missing or set Options missing=' '; before generating the graph to make slightly cleaner tip appearance.

4 REPLIES 4
Super User

## Re: Dataset Transformation Help Wanted

Can you indicate exactly which rows/values you are summing to get the output data set?

From what little I understand there is more than one way to get several of those "sums".

This seems to work for your example data.

I'm not sure how extensible it might be.

```data want;
set have;
array s(3) \$;
retain ksum s1-s3 counter;
by target;
if first.target then do;
call missing(ksum, of s(*),counter);

end;
if k then do;
counter+1;
ksum+k;
s[counter] = station;
end;
if last.target then output;
drop station k counter;
run;

```

You might want to use either a custom format to display something like N/A for the S variables when the value is missing or set Options missing=' '; before generating the graph to make slightly cleaner tip appearance.

Quartz | Level 8

## Re: Dataset Transformation Help Wanted

Thanks to all who responded with solutions.  They all work but I will mark ballardw's solution as "accepted" since his was first.  Thanks again for your help.

Gene

PROC Star

## Re: Dataset Transformation Help Wanted

If K is always a 1 or 0, then as ksum increments, then next S variable (S1, S2, S3) is assigned the corresponding station value:

``````data have;
input Target \$ Tx Ty Station \$ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
run;
data want (drop=k station);
set have;
by target;
retain ksum 0 s1-s3 '  ';
array s{3};
if first.target then call missing (ksum,s1,s2,s3);
ksum+k;
if k=1 then s{ksum}=station;
if last.target;
run;``````
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Super User

## Re: Dataset Transformation Help Wanted

No idea how this relates to making a graphic.

But you just need a DO loop.

``````data have;
input Target \$ Tx Ty Station \$ k;
datalines;
0/0 0 0 A 1
0/0 0 0 B 0
0/0 0 0 C 0
0/1 0 1 A 1
0/1 0 1 B 1
0/1 0 1 C 0
1/0 1 0 A 1
1/0 1 0 B 1
1/0 1 0 C 1
;

data want;
do until(last.target) ;
set have ;
by target;
array S[3] \$8;
ksum=sum(ksum,k);
if k then S[ksum]=station;
end;
run;``````

Result

```Obs    Target    Tx    Ty    Station    k    S1    S2    S3    ksum

1      0/0       0     0       C       0    A                   1
2      0/1       0     1       C       0    A     B             2
3      1/0       1     0       C       1    A     B     C       3

```

Change the dimension of the ARRAY if there might be more than three stations for a target you need to record.

Discussion stats
• 4 replies
• 662 views
• 1 like
• 4 in conversation