Write and run SAS programs in your web browser

Add a new column based on the entries of other columns

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Add a new column based on the entries of other columns

Thank you very much in advance!

 

I use SAS 9.4. I want to add a column to my sas dataset. Generally, I know how it can be done. However, the condition under which I put values in the new column is complicated. So. I had to ask this question. I also apoligize if my statements for describing the problem are not very straightforward.

 

My original dataset is very large, so I provided a part of it here:

ENROLIDDX1DX2DX3DX4DX5DX6DX7DX8DX9DX10
1330583047373051889V15173739V454V5489V5849427145897452
1330583045188474527567246605188979314293786095180 
1330583045188127615180518897931745249390458948678609
133062302486780607870148178650     
133062601820827204010414004939082092724719454019 
133062601414004111496410704019     
1330626011623401941400162978667856    
1330646024031158495855276514280585959393272325040496
13306460258492762403915867805078079V58824293244925000
133064602996734039158567865025000585958678659  
133068301V5811201502859202804538678701    
133068301V58112015178060201502768     
1330683012880020190287478061      
1330683012015842334862015020190285978060202804278951889

 

In this data, I want to add a new column based on the values under variables "DX1, ..., DX10."

For each observation (i.e., row), if any of the values under these variables (i.e., DX1, ..., DX10) follow the numbers provided in the table below, then the value of the new column (for that specific row) will be one of 7 categories (shown in the excel file). So, to make it clear, the elements of the new column should be one of those 7 categories.

 

category 1category 2category 3category 4category 5category 6category 7
30780276130400305505859965015855
401920158518893055178650E85007452
2500033900304023055296502 51883
3078933901304033055396509 7489
33803390230470 E8501 78600
338113390530471   7991
338123391030472   99739
338183391130473    
3381933912     
3382133920     
33828      
33829      
3384      
3502      
37991      

 

I will be very thankful if anyone can help me in this regard!

 


Accepted Solutions
Solution
‎07-04-2017 12:18 PM
PROC Star
Posts: 7,363

Re: Add a new column based on the entries of other columns

Here is one way:

data have;
  infile cards dlm='09'x truncover;
  informat ENROLID $9.;
  input ENROLID	(DX1	DX2	DX3	DX4	DX5	DX6	DX7	DX8	DX9	DX10) ($);
  cards;
133058304	73730	51889	V151	73739	V454	V5489	V5849	4271	4589	7452
133058304	51884	7452	75672	4660	51889	7931	4293	78609	5180	 
133058304	51881	2761	5180	51889	7931	7452	49390	4589	486	78609
133062302	486	78060	78701	481	78650	 	 	 	 	 
133062601	8208	2720	4010	41400	49390	8209	2724	71945	4019	 
133062601	41400	4111	496	41070	4019	 	 	 	 	 
133062601	1623	4019	41400	1629	7866	7856	 	 	 	 
133064602	40311	5849	5855	27651	4280	5859	5939	32723	25040	496
133064602	5849	2762	40391	586	78050	78079	V5882	4293	2449	25000
133064602	99673	40391	5856	78650	25000	5859	586	78659	 	 
133068301	V5811	20150	2859	20280	45386	78701	 	 	 	 
133068301	V5811	20151	78060	20150	2768	 	 	 	 	 
133068301	28800	20190	2874	78061	 	 	 	 	 	 
133068301	20158	4233	486	20150	20190	2859	78060	20280	42789	51889
;

data want (drop=i);
  array c1(15) $ _temporary_ ('30780' '4019' '25000' '30789' '3380' '33811' '33812' '33818'
                 '33819' '33821' '33828' '33829' '3384' '3502' '37991');
  array c2(10) $ _temporary_ ('2761' '20158' '33900' '33901' '33902' '33905' '33910' '33911' '33912' '33920');
  array c3(8) $ _temporary_ ('30400' '51889' '30402' '30403' '30470' '30471' '30472' '30473');
  array c4(4) $ _temporary_ ('30550' '30551' '30552' '30553');
  array c5(5) $ _temporary_ ('5859' '78650' '96502' '96509' 'E8501');
  array c6(2) $ _temporary_ ('96501' 'E8500');
  array c7(7) $ _temporary_ ('5855' '7452' '51883' '7489' '78600' '7991' '99739');
  array dx(*) $  dx1-dx10;
  set have;
  do i=1 to dim(dx);
    if dx(i) in c1 then category=1;
    else if dx(i) in c2 then category=2;
    else if dx(i) in c3 then category=2;
    else if dx(i) in c4 then category=2;
    else if dx(i) in c5 then category=2;
    else if dx(i) in c6 then category=2;
    else if dx(i) in c7 then category=2;
    else call missing(category);
    if not missing(category) then leave;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Add a new column based on the entries of other columns

What if multiple entries (in a row) match the values of two or more columns?

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 13

Re: Add a new column based on the entries of other columns

[ Edited ]

In the actual data, that won't happen (categories are mutually exclusive). But, if that generally happens, we assign one category at random.

Solution
‎07-04-2017 12:18 PM
PROC Star
Posts: 7,363

Re: Add a new column based on the entries of other columns

Here is one way:

data have;
  infile cards dlm='09'x truncover;
  informat ENROLID $9.;
  input ENROLID	(DX1	DX2	DX3	DX4	DX5	DX6	DX7	DX8	DX9	DX10) ($);
  cards;
133058304	73730	51889	V151	73739	V454	V5489	V5849	4271	4589	7452
133058304	51884	7452	75672	4660	51889	7931	4293	78609	5180	 
133058304	51881	2761	5180	51889	7931	7452	49390	4589	486	78609
133062302	486	78060	78701	481	78650	 	 	 	 	 
133062601	8208	2720	4010	41400	49390	8209	2724	71945	4019	 
133062601	41400	4111	496	41070	4019	 	 	 	 	 
133062601	1623	4019	41400	1629	7866	7856	 	 	 	 
133064602	40311	5849	5855	27651	4280	5859	5939	32723	25040	496
133064602	5849	2762	40391	586	78050	78079	V5882	4293	2449	25000
133064602	99673	40391	5856	78650	25000	5859	586	78659	 	 
133068301	V5811	20150	2859	20280	45386	78701	 	 	 	 
133068301	V5811	20151	78060	20150	2768	 	 	 	 	 
133068301	28800	20190	2874	78061	 	 	 	 	 	 
133068301	20158	4233	486	20150	20190	2859	78060	20280	42789	51889
;

data want (drop=i);
  array c1(15) $ _temporary_ ('30780' '4019' '25000' '30789' '3380' '33811' '33812' '33818'
                 '33819' '33821' '33828' '33829' '3384' '3502' '37991');
  array c2(10) $ _temporary_ ('2761' '20158' '33900' '33901' '33902' '33905' '33910' '33911' '33912' '33920');
  array c3(8) $ _temporary_ ('30400' '51889' '30402' '30403' '30470' '30471' '30472' '30473');
  array c4(4) $ _temporary_ ('30550' '30551' '30552' '30553');
  array c5(5) $ _temporary_ ('5859' '78650' '96502' '96509' 'E8501');
  array c6(2) $ _temporary_ ('96501' 'E8500');
  array c7(7) $ _temporary_ ('5855' '7452' '51883' '7489' '78600' '7991' '99739');
  array dx(*) $  dx1-dx10;
  set have;
  do i=1 to dim(dx);
    if dx(i) in c1 then category=1;
    else if dx(i) in c2 then category=2;
    else if dx(i) in c3 then category=2;
    else if dx(i) in c4 then category=2;
    else if dx(i) in c5 then category=2;
    else if dx(i) in c6 then category=2;
    else if dx(i) in c7 then category=2;
    else call missing(category);
    if not missing(category) then leave;
  end;
run;

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 13

Re: Add a new column based on the entries of other columns

Thank you vrey much! But, when I run the code, both "Have" and "Want" data files only show ENROLID (i.e., DX1,...DX10 are not shown). 

Furthermore, as I mentioned in my first message, the data provided is just a part of the original data (with almost 10 million observations). So, if I follow your approach, how can I accomodate the data?

 

Thanks again!

PROC Star
Posts: 7,363

Re: Add a new column based on the entries of other columns

When I run the code I get enrollid, dx1-dx10, and category. You would have to show your log.

 

Should work regardless of how many records there are.

 

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 13

Re: Add a new column based on the entries of other columns

I was able to find the answer. I used a part of your code. That DEFINITELY helped. Thanks a lot!

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 154 views
  • 0 likes
  • 2 in conversation