Thank you very much in advance!
I use SAS 9.4. I want to add a column to my sas dataset. Generally, I know how it can be done. However, the condition under which I put values in the new column is complicated. So. I had to ask this question. I also apoligize if my statements for describing the problem are not very straightforward.
My original dataset is very large, so I provided a part of it here:
| ENROLID | DX1 | DX2 | DX3 | DX4 | DX5 | DX6 | DX7 | DX8 | DX9 | DX10 | 
| 133058304 | 73730 | 51889 | V151 | 73739 | V454 | V5489 | V5849 | 4271 | 4589 | 7452 | 
| 133058304 | 51884 | 7452 | 75672 | 4660 | 51889 | 7931 | 4293 | 78609 | 5180 | |
| 133058304 | 51881 | 2761 | 5180 | 51889 | 7931 | 7452 | 49390 | 4589 | 486 | 78609 | 
| 133062302 | 486 | 78060 | 78701 | 481 | 78650 | |||||
| 133062601 | 8208 | 2720 | 4010 | 41400 | 49390 | 8209 | 2724 | 71945 | 4019 | |
| 133062601 | 41400 | 4111 | 496 | 41070 | 4019 | |||||
| 133062601 | 1623 | 4019 | 41400 | 1629 | 7866 | 7856 | ||||
| 133064602 | 40311 | 5849 | 5855 | 27651 | 4280 | 5859 | 5939 | 32723 | 25040 | 496 | 
| 133064602 | 5849 | 2762 | 40391 | 586 | 78050 | 78079 | V5882 | 4293 | 2449 | 25000 | 
| 133064602 | 99673 | 40391 | 5856 | 78650 | 25000 | 5859 | 586 | 78659 | ||
| 133068301 | V5811 | 20150 | 2859 | 20280 | 45386 | 78701 | ||||
| 133068301 | V5811 | 20151 | 78060 | 20150 | 2768 | |||||
| 133068301 | 28800 | 20190 | 2874 | 78061 | ||||||
| 133068301 | 20158 | 4233 | 486 | 20150 | 20190 | 2859 | 78060 | 20280 | 42789 | 51889 | 
In this data, I want to add a new column based on the values under variables "DX1, ..., DX10."
For each observation (i.e., row), if any of the values under these variables (i.e., DX1, ..., DX10) follow the numbers provided in the table below, then the value of the new column (for that specific row) will be one of 7 categories (shown in the excel file). So, to make it clear, the elements of the new column should be one of those 7 categories.
| category 1 | category 2 | category 3 | category 4 | category 5 | category 6 | category 7 | 
| 30780 | 2761 | 30400 | 30550 | 5859 | 96501 | 5855 | 
| 4019 | 20158 | 51889 | 30551 | 78650 | E8500 | 7452 | 
| 25000 | 33900 | 30402 | 30552 | 96502 | 51883 | |
| 30789 | 33901 | 30403 | 30553 | 96509 | 7489 | |
| 3380 | 33902 | 30470 | E8501 | 78600 | ||
| 33811 | 33905 | 30471 | 7991 | |||
| 33812 | 33910 | 30472 | 99739 | |||
| 33818 | 33911 | 30473 | ||||
| 33819 | 33912 | |||||
| 33821 | 33920 | |||||
| 33828 | ||||||
| 33829 | ||||||
| 3384 | ||||||
| 3502 | ||||||
| 37991 | 
I will be very thankful if anyone can help me in this regard!
Here is one way:
data have;
  infile cards dlm='09'x truncover;
  informat ENROLID $9.;
  input ENROLID	(DX1	DX2	DX3	DX4	DX5	DX6	DX7	DX8	DX9	DX10) ($);
  cards;
133058304	73730	51889	V151	73739	V454	V5489	V5849	4271	4589	7452
133058304	51884	7452	75672	4660	51889	7931	4293	78609	5180	 
133058304	51881	2761	5180	51889	7931	7452	49390	4589	486	78609
133062302	486	78060	78701	481	78650	 	 	 	 	 
133062601	8208	2720	4010	41400	49390	8209	2724	71945	4019	 
133062601	41400	4111	496	41070	4019	 	 	 	 	 
133062601	1623	4019	41400	1629	7866	7856	 	 	 	 
133064602	40311	5849	5855	27651	4280	5859	5939	32723	25040	496
133064602	5849	2762	40391	586	78050	78079	V5882	4293	2449	25000
133064602	99673	40391	5856	78650	25000	5859	586	78659	 	 
133068301	V5811	20150	2859	20280	45386	78701	 	 	 	 
133068301	V5811	20151	78060	20150	2768	 	 	 	 	 
133068301	28800	20190	2874	78061	 	 	 	 	 	 
133068301	20158	4233	486	20150	20190	2859	78060	20280	42789	51889
;
data want (drop=i);
  array c1(15) $ _temporary_ ('30780' '4019' '25000' '30789' '3380' '33811' '33812' '33818'
                 '33819' '33821' '33828' '33829' '3384' '3502' '37991');
  array c2(10) $ _temporary_ ('2761' '20158' '33900' '33901' '33902' '33905' '33910' '33911' '33912' '33920');
  array c3(8) $ _temporary_ ('30400' '51889' '30402' '30403' '30470' '30471' '30472' '30473');
  array c4(4) $ _temporary_ ('30550' '30551' '30552' '30553');
  array c5(5) $ _temporary_ ('5859' '78650' '96502' '96509' 'E8501');
  array c6(2) $ _temporary_ ('96501' 'E8500');
  array c7(7) $ _temporary_ ('5855' '7452' '51883' '7489' '78600' '7991' '99739');
  array dx(*) $  dx1-dx10;
  set have;
  do i=1 to dim(dx);
    if dx(i) in c1 then category=1;
    else if dx(i) in c2 then category=2;
    else if dx(i) in c3 then category=2;
    else if dx(i) in c4 then category=2;
    else if dx(i) in c5 then category=2;
    else if dx(i) in c6 then category=2;
    else if dx(i) in c7 then category=2;
    else call missing(category);
    if not missing(category) then leave;
  end;
run;
Art, CEO, AnalystFinder.com
What if multiple entries (in a row) match the values of two or more columns?
Art, CEO, AnalystFinder.com
In the actual data, that won't happen (categories are mutually exclusive). But, if that generally happens, we assign one category at random.
Here is one way:
data have;
  infile cards dlm='09'x truncover;
  informat ENROLID $9.;
  input ENROLID	(DX1	DX2	DX3	DX4	DX5	DX6	DX7	DX8	DX9	DX10) ($);
  cards;
133058304	73730	51889	V151	73739	V454	V5489	V5849	4271	4589	7452
133058304	51884	7452	75672	4660	51889	7931	4293	78609	5180	 
133058304	51881	2761	5180	51889	7931	7452	49390	4589	486	78609
133062302	486	78060	78701	481	78650	 	 	 	 	 
133062601	8208	2720	4010	41400	49390	8209	2724	71945	4019	 
133062601	41400	4111	496	41070	4019	 	 	 	 	 
133062601	1623	4019	41400	1629	7866	7856	 	 	 	 
133064602	40311	5849	5855	27651	4280	5859	5939	32723	25040	496
133064602	5849	2762	40391	586	78050	78079	V5882	4293	2449	25000
133064602	99673	40391	5856	78650	25000	5859	586	78659	 	 
133068301	V5811	20150	2859	20280	45386	78701	 	 	 	 
133068301	V5811	20151	78060	20150	2768	 	 	 	 	 
133068301	28800	20190	2874	78061	 	 	 	 	 	 
133068301	20158	4233	486	20150	20190	2859	78060	20280	42789	51889
;
data want (drop=i);
  array c1(15) $ _temporary_ ('30780' '4019' '25000' '30789' '3380' '33811' '33812' '33818'
                 '33819' '33821' '33828' '33829' '3384' '3502' '37991');
  array c2(10) $ _temporary_ ('2761' '20158' '33900' '33901' '33902' '33905' '33910' '33911' '33912' '33920');
  array c3(8) $ _temporary_ ('30400' '51889' '30402' '30403' '30470' '30471' '30472' '30473');
  array c4(4) $ _temporary_ ('30550' '30551' '30552' '30553');
  array c5(5) $ _temporary_ ('5859' '78650' '96502' '96509' 'E8501');
  array c6(2) $ _temporary_ ('96501' 'E8500');
  array c7(7) $ _temporary_ ('5855' '7452' '51883' '7489' '78600' '7991' '99739');
  array dx(*) $  dx1-dx10;
  set have;
  do i=1 to dim(dx);
    if dx(i) in c1 then category=1;
    else if dx(i) in c2 then category=2;
    else if dx(i) in c3 then category=2;
    else if dx(i) in c4 then category=2;
    else if dx(i) in c5 then category=2;
    else if dx(i) in c6 then category=2;
    else if dx(i) in c7 then category=2;
    else call missing(category);
    if not missing(category) then leave;
  end;
run;
Art, CEO, AnalystFinder.com
Thank you vrey much! But, when I run the code, both "Have" and "Want" data files only show ENROLID (i.e., DX1,...DX10 are not shown).
Furthermore, as I mentioned in my first message, the data provided is just a part of the original data (with almost 10 million observations). So, if I follow your approach, how can I accomodate the data?
Thanks again!
When I run the code I get enrollid, dx1-dx10, and category. You would have to show your log.
Should work regardless of how many records there are.
Art, CEO, AnalystFinder.com
I was able to find the answer. I used a part of your code. That DEFINITELY helped. Thanks a lot!
I am doing a similar analysis and would appreciate if you can post your solution.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.