I do have a data which I would like to create variables to flag them out. The values in cat are changes as columns. I am thinking there is a better way to do this instead of if-else statement.
Have
data cats;
input ID $ Cat $ Date $;
datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;
Want
ID | cat_A | cat_B | cat_C | A_date | B_date | C_date |
1 | 1 | 1 | 0 | 1/17/2020 | 4/12/2020 | |
2 | 1 | 0 | 0 | 2/21/2020 | ||
3 | 0 | 0 | 1 | 12/30/2020 | ||
4 | 0 | 1 | 1 | 1/5/2020 | 7/19/2020 | |
5 | 1 | 0 | 0 | 6/25/2020 | ||
6 | 0 | 1 | 0 | 7/17/2020 | ||
7 | 0 | 1 | 0 | 6/20/2020 |
In my opinion, the better way to do this is to leave the data set long, instead of making it wide with awkward column names.
What is the next step after your obtain this wide data set? What analysis or report are you planning to produce? Most analyses and reports are more easily produced from long data sets.
Plenty of ways to de-duplicate data without creating this long data set with awkward variable names.
But you didn't explain what the next analysis or report is going to be, so it would help greatly to know what that is. Without that knowledge of what is next, I'm not going to try to create code for this problem. Long data sets can be used for lookup tables as well.
@PaigeMiller wrote:
Long data sets can be used for lookup tables as well.
Usually easier to do a lookup from a long data set since you can merge easily.
@kashun wrote:
@PaigeMiller. The data has duplicates and I do need it to be one id per row. This will be used as a lookup table to run other programs.
Duplicates of cat will cause problems when transposing. Searching for duplicates is always easier in the long dataset.
data cats;
input ID $ Cat $ Date $;
datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;
proc transpose data=cats out=cats_wide prefix=GROUP_;
by ID;
id CAT;
var DATE;
run;
data want;
set cats_wide;
array flags(*) flag1-flag3;
array GROUP(*) group_A--group_c;
do i=1 to 3;
flags(i) = not missing(group(i)) ;
end;
drop i _name_;
run;
proc print data=want;
run;
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
@kashun wrote:
I do have a data which I would like to create variables to flag them out. The values in cat are changes as columns. I am thinking there is a better way to do this instead of if-else statement.
Have
data cats; input ID $ Cat $ Date $; datalines; 1 A 1/17/2020 1 B 4/12/2020 2 A 2/21/2020 3 C 12/30/2020 4 B 1/5/2020 4 C 7/19/2020 5 A 6/25/2020 6 B 7/17/2020 7 B 6/20/2020 ; run;
Want
ID cat_A cat_B cat_C A_date B_date C_date 1 1 1 0 1/17/2020 4/12/2020 2 1 0 0 2/21/2020 3 0 0 1 12/30/2020 4 0 1 1 1/5/2020 7/19/2020 5 1 0 0 6/25/2020 6 0 1 0 7/17/2020 7 0 1 0 6/20/2020
Merge Skill proposed by me ,Art.T and Matt.
https://support.sas.com/resources/papers/proceedings15/2785-2015.pdf
data cats;
input ID $ Cat $ Date $;
datalines;
1 A 1/17/2020
1 B 4/12/2020
2 A 2/21/2020
3 C 12/30/2020
4 B 1/5/2020
4 C 7/19/2020
5 A 6/25/2020
6 B 7/17/2020
7 B 6/20/2020
;
run;
proc sql noprint;
select distinct catt('cats(where=(cat_',cat,'="',cat,'") rename=(cat=cat_',cat,' date=',cat,'_date))')
into : merge separated by ' '
from cats;
quit;
data want;
merge &merge.;
by id;
run;
data want;
set want;
array x{*} $ cat_:;
do i=1 to dim(x);
if missing(x{i}) then x{i}='0';
else x{i}='1';
end;
drop i;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.