Hi Guys
Need some help with this problem.
This is the two tables I have:
data sample;
infile datalines missover;
input Date $ Type1 Type2 $ Money P_10 P_108 P_9 O_100 O_98 O_5 P_50 P_3 O_355;
datalines;
20170101 0 A 100 50 10 . 40
20170101 0 A 110 . . . . 110
20170101 1 B 1200 200 . . . . 500 . . 500
20170101 1 B 202 100 . . . . . 102
20170101 5 C 5 . . . . 5
;
run;
data reference_id;
infile datalines missover;
input ID Descrip $ Status $ category $;
datalines;
10 P PAID GOOD
108 P PAID GOOD
9 P PAID BAD
100 M PAID GOOD
98 M PAID BAD
5 M PAID GOOD
50 P PAID BAD
3 P PAID GOOD
355 M PAID N/A
;
run;
I need to extract all the ID's from sample into an array (remove the prefix) and join it with the ID properties in the reference_id table.
The ID should be replaced with its respective properties e.g P_10 in sample table equals ID 10 in reference_id table - the output should read P_PAID_GOOD.
This is the output I desire.
data output;
infile datalines missover;
input Date $ Type1 Type2 $ Money P_PAID_GOOD P_PAID_BAD M_PAID_GOOD M_PAID_BAD M_PAID_N_A ;
datalines;
20170101 0 A 100 60 . 40
20170101 0 A 110 . . . 110
20170101 1 B 1200 200 . 500 . 500
20170101 1 B 202 100 . . 102 .
20170101 5 C 5 . . . 5 .
;
run;
I am not sure about your output dataset, this doesn't seem to match what you say in the post.
This is where I got to, so can be a start for you:
data sample; infile datalines missover; input Date $ Type1 Type2 $ Money P_10 P_108 P_9 O_100 O_98 O_5 P_50 P_3 O_355; datalines; 20170101 0 A 100 50 10 . 40 20170101 0 A 110 . . . . 110 20170101 1 B 1200 200 . . . . 500 . . 500 20170101 1 B 202 100 . . . . . 102 20170101 5 C 5 . . . . 5 ; run; proc transpose data=sample out=inter; by date type1 type2 money notsorted; var p_: o_:; run; data inter; set inter; id=input(scan(_name_,2,"_"),best.); run; data reference_id; infile datalines missover; input ID Descrip $ Status $ category $; datalines; 10 P PAID GOOD 108 P PAID GOOD 9 P PAID BAD 100 M PAID GOOD 98 M PAID BAD 5 M PAID GOOD 50 P PAID BAD 3 P PAID GOOD 355 M PAID N/A ; run; proc sql; create table COMB as select distinct A.DATE, A.TYPE1, A.TYPE2, A.MONEY, catx("_",B.STATUS,B.CATEGORY) as LAB, sum(COL1) as VAL from (select * from INTER where COL1 ne .) A left join REFERENCE_ID B on A.ID=B.ID group by A.DATE, A.TYPE1, A.TYPE2, A.MONEY, catx("_",B.STATUS,B.CATEGORY); quit; proc transpose data=comb out=want; by date type1 type2 money; var val; id lab; run;
I am not sure about your output dataset, this doesn't seem to match what you say in the post.
This is where I got to, so can be a start for you:
data sample; infile datalines missover; input Date $ Type1 Type2 $ Money P_10 P_108 P_9 O_100 O_98 O_5 P_50 P_3 O_355; datalines; 20170101 0 A 100 50 10 . 40 20170101 0 A 110 . . . . 110 20170101 1 B 1200 200 . . . . 500 . . 500 20170101 1 B 202 100 . . . . . 102 20170101 5 C 5 . . . . 5 ; run; proc transpose data=sample out=inter; by date type1 type2 money notsorted; var p_: o_:; run; data inter; set inter; id=input(scan(_name_,2,"_"),best.); run; data reference_id; infile datalines missover; input ID Descrip $ Status $ category $; datalines; 10 P PAID GOOD 108 P PAID GOOD 9 P PAID BAD 100 M PAID GOOD 98 M PAID BAD 5 M PAID GOOD 50 P PAID BAD 3 P PAID GOOD 355 M PAID N/A ; run; proc sql; create table COMB as select distinct A.DATE, A.TYPE1, A.TYPE2, A.MONEY, catx("_",B.STATUS,B.CATEGORY) as LAB, sum(COL1) as VAL from (select * from INTER where COL1 ne .) A left join REFERENCE_ID B on A.ID=B.ID group by A.DATE, A.TYPE1, A.TYPE2, A.MONEY, catx("_",B.STATUS,B.CATEGORY); quit; proc transpose data=comb out=want; by date type1 type2 money; var val; id lab; run;
Here's a different start. I just don't have time to finish it.
Because there are so few entries in the second table, taking on a limited set of numeric ID values, this is a good candidate for a rarely uised technique.
data want;
array st {355} $ 1 _temporary_;
array cat {355} $ 4 _temporary_;
if _n_=1 then do until (done);
set reference_id end=done;
st{id} = status;
cat{id} = category;
end;
This gives you two temporary arrays meaning their elements are automatically retained, and automatically dropped at the end of the DATA step. Most of the array elements are missing, but that won't harm anything. It doesn't take much memory to hold 355 elements.
What's left to do in the same DATA step:
set sample;
array entries {9} P_10 P_108 P_9 O_100 ... O_355;
do i=1 to 9;
length P_vs_O $ 1;
P_vs_O = vname(entries{i});
ID = input(substr (vname(entries{i}), 3), 3.);
* Put pieces in the right buckets;
end;
*keep/drop as you see fit;
run;
There's probably a detail or two that I left out, but this approach will be as straightforward as it gets (unfortunately so, perhaps).
In my experience the long format is more useful.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.