Hi All,
From the below master table, I need to split word as separate row when encounter semicolon with same name and role.
After that, I need to match comment column from master dataset with area column of labelling dataset. If matches, same text will be placed, if not then "Other" should populate (e.g.User4). For blank it should be kept as it is.
I am not aware how to keep the data when multiple semicolon available. I just placed raw data.
data master;
length comment $ 200;
input name $ role $ comment $;
cards;
User1 clerk Oncology;Infectious disease;Cardiovascular disease;Respiratory;
User2 srclerk .
User4 peon Oncology;Immune;
User5 manager Oncology;Neurosciences;Immunology;Infectious disease;Dermatology;Rare Diseases - Multiple Indications, Multiple Disease areas;
run;
data labelling;
input area $200.;
cards;
Immunology
Infectious disease
Dermatology
Cardiovascular disease
Rare Diseases - Multiple Indications, Multiple Disease areas
Vaccines
;
run;
My expected output
name | role | comment | flag |
User1 | clerk | Oncology | 1 |
User1 | clerk | Infectious disease | 1 |
User1 | clerk | Cardiovascular disease | 1 |
User1 | clerk | Other | 1 |
User2 | srclerk | . | 1 |
User4 | peon | Oncology | 1 |
User4 | peon | Other | 1 |
User5 | manager | Oncology | 1 |
User5 | manager | Neurosciences | 1 |
User5 | manager | Immunology | 1 |
User5 | manager | Infectious disease | 1 |
User5 | manager | Dermatology | 1 |
User5 | manager | Rare Diseases - Multiple Indications, Multiple Disease areas | 1 |
Use a DO loop and the SCAN() function.
data have;
infile cards truncover;
input name :$10. role :$10. comment $200.;
cards4;
User1 clerk Oncology;Infectious disease;Cardiovascular disease;Respiratory;
User2 srclerk .
User4 peon Oncology;Immune;
User5 manager Oncology;Neurosciences;Immunology;Infectious disease;Dermatology;Rare Diseases - Multiple Indications, Multiple Disease areas;
;;;;
data want;
set have;
do item=1 to max(1,countw(comment,';'));
area = scan(comment,item,';');
if area ne ' ' or item=1 then output;
end;
run;
Results:
Obs name role item area 1 User1 clerk 1 Oncology 2 User1 clerk 2 Infectious disease 3 User1 clerk 3 Cardiovascular disease 4 User1 clerk 4 Respiratory 5 User2 srclerk 1 6 User4 peon 1 Oncology 7 User4 peon 2 Immune 8 User5 manager 1 Oncology 9 User5 manager 2 Neurosciences 10 User5 manager 3 Immunology 11 User5 manager 4 Infectious disease 12 User5 manager 5 Dermatology 13 User5 manager 6 Rare Diseases - Multiple Indications, Multiple Disease areas
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.