DATA Step, Macro, Functions and more

Need help to handle missing values

Accepted Solution Solved
Reply
Super Contributor
Posts: 265
Accepted Solution

Need help to handle missing values

I was trying to attached sample dataset along with this post. but its not taking excel or sas dataset,.

I have paste subset here for ur reference , where is missing values in org_id , i want to fill that missing values with related org_id.

is it possible to fill those in sas..

Please suggest , thanks

Observations are :

Brandidorg_idsales
A111113252
A221112859
A331113539
A441113890
A551113944
A661113825
A771114711
A881113178
A991114992
A10101114020
A11111112589
A12121111559
B 1131111949
B 214
B 315
B 416
B 517
B 618
B 719
B 820
B 921
B 10221113764
B 11231111402
B 12241111723
C1251111086
C2261114767
C3271111455
C4281113552
C5291114862
C630
C731
C832
C933
C1034
C11351113482
C12361111377
D1371112803
D2381113021
D3391113074
D4401111246
D541
D642
D743
D8441113690
D9451111022
D10461114089
D11471111603
D12481113898
A112223252
A222222859
A332223539
A442223890
A552223944
A662223825
A772224711
A882223178
A99
A1010
A1111
A1212
B 113
B 214
B 3152223539
B 4162223890
B 5172223944
B 6182223825
B 7192224711
B 8202223178
B 9212225435
B 10222223764
B 11232221402
B 12242221723
C1252221086
C2262224767
C3272221455
C4282223552
C5292224862
C6302221949
C7312222859
C8322223539
C9332223890
C10342223944
C11352223825
C12362221377
D1372222803
D2382223021
D339
D440
D541
D642
D743
D844
D945
D10462223825
D11472221603
D12482223898
A113333252
A223332859
A333333539
A443333890
A553333944
A663333825
A773334711
A883333178
A993334992
A10103334020
A11113332589
A12123331559
B 1133331949
B 2143333890
B 3153333944
B 4163333825
B 5173334711
B 6183333178
B 719333234
B 8203335346
B 9213333566
B 1022
B 1123
B 1224
C125
C226
C327
C428
C5293334862
C6303333898
C7313333252
C8323332859
C9333333539
C10343332345
C11353333482
C12363331377
D1373332803
D2383333021
D3393333074
D440
D541
D642
D743
D844
D9453331022
D10463334089
D11473331603
D12483333898
A114443252
A224442859
A334443539
A444443890
A554443944
A664443825
A774444711
A884443178
A994444992
A10104444020
A11114442589
A12124441559
B 1134441949
B 2144441949
B 3154442859
B 4164443539
B 5174443890
B 6184443944
B 7194443825
B 820
B 921
B 1022
B 1123
B 1224
C125
C226
C327
C428
C5294444862
C6304441949
C7314442859
C8324443539
C9334443890
C10344443944
C11354443825
C12364441377
D1374442803
D2384443021
D339
D440
D541
D642
D743
D8444443690
D9454441022
D10464444089
D11474441603
D12484443898
A115553252
A225552859
A335553539
A445553890
A555553944
A665553825
A775554711
A885553178
A995554992
A10105554020
A11115552589
A12125551559
B 1135551949
B 214555
B 315555
B 416555
B 517555
B 618555
B 719555
B 820555
B 921555
B 10225553764
B 11235551402
B 12245551723
C1255551086
C2265554767
C3275551455
C4285553552
C5295554862
C630555
C731555
C832555
C933555
C1034555
C11355553482
C12365551377
D1375552803
D2385553021
D3395553074
D4405551246
D541555
D642555
D743555
D8445553690
D9455551022
D10465554089
D11475551603
D12485553898

Accepted Solutions
Solution
‎06-09-2015 06:39 AM
Super User
Super User
Posts: 7,407

Re: Need help to handle missing values

Yes, you can generalise the code several ways, example below.  The question is why would you want to do this?  The information is already there in org_id, why create 11000 columns to duplicate what is already there in org_id, all you are doing is vastly sizing up your data for no additional benefit?  I.e. if I want to sum all records with 11 as org_id, then i do a sum() where org_id=11, I don't create a new variable with a 1 for records with org_id and then sum those.

data have (drop=i);

  org_id=11;

  do i=1 to 5; output; end;

  org_id=22;

  do i=1 to 8; output; end;

  org_id=33;

  do i=1 to 10; output; end;

run;

proc sql;

  create table LOOP as

  select  distinct ORG_ID

  from    WORK.HAVE;

quit;

options missing=0;

data _null_;

  set loop end=last;

  if _n_=1 then call execute('data want; set have; ');

  call execute(' if org_id='||strip(org_id)||' then _'||strip(org_id)||'=1;');

  if last then call execute('run;');

run;

View solution in original post


All Replies
Super User
Posts: 6,948

Re: Need help to handle missing values

define "related org_id"

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 265

Re: Need help to handle missing values

Each org has 12 brand . i m looking to replace missing org-id  with related org_id

I.e. 1-12 brand for org 111 where few observation are having missing value for org id. I want to replace them with 111.

Kindly suggest

Valued Guide
Posts: 858

Re: Need help to handle missing values

make a different dataset with all the org_id that you want, then join.  I hope you realize that nobody would have ever known what it was you wanted without you saying that.  You should give an example of what it is you are looking for.

Super Contributor
Posts: 265

Re: Need help to handle missing values

Apology in Advance for posting my question here , as i m not able to post my question in discussion forum,

Kindly help and suggest.

Hi All,

i have a data set like :


Org_id

11

11

11

11

11

22

22

22

22

22

22

22

22

33

33

33

33

33

33

33

33

33

33


and i looking for dataset like :

Org_id _11 _22 _33

11 1 0 0

11 1 0 0

11 1 0 0

11 1 0 0

11 1 0 0

22 0 1 0

22 0 1 0

22 0 1 0

22 0 1 0

22 0 1 0

22 0 1 0

22 0 1 0

22 0 1 0

33 0 0 1

33 0 0 1
33 0 0 1

33 0 0 1

33 0 0 1

33 0 0 1

33 0 0 1

33 0 0 1

33 0 0 1

33 0 0 1

i trying code is :

%let org_list1=11,22,33 (in main code i have created it by using proc sql and separated bby ",")

Data Temp;

set mmm_test;

do i =1 to 2;

if org_id= input(scan("&org_list1",1,","),12.) then

do; %let var= _%scan("&org_list1",i,",");

&var=1;

end;

else do;

%let var= _%scan("&org_list1",i,",");

&var=0;

end;

end;

run;


its not working.

Kindly suggest,

Thanks & Regards,

Super User
Super User
Posts: 7,407

Re: Need help to handle missing values

It doesn't really make sense what you are trying to do.  Some code which will get the result is below, but clarify your question:

data have;

  org_id=11;

  do i=1 to 5; output; end;

  org_id=22;

  do i=1 to 8; output; end;

  org_id=33;

  do i=1 to 10; output; end;

run;

data want;

  set have;

  _11=0; _22=0; _33=0;

  if org_id=11 then _11=1;

  if org_id=22 then _22=1;

  if org_id=33 then _33=1;

run;

Super Contributor
Posts: 265

Re: Need help to handle missing values

i have data where 11000 org_id available and each org_id for 12 months, to do further analysis i need to create variable as org_id num and put one for same id in observation else 0

as i showed in my desired output. u made for static scenario.

Solution
‎06-09-2015 06:39 AM
Super User
Super User
Posts: 7,407

Re: Need help to handle missing values

Yes, you can generalise the code several ways, example below.  The question is why would you want to do this?  The information is already there in org_id, why create 11000 columns to duplicate what is already there in org_id, all you are doing is vastly sizing up your data for no additional benefit?  I.e. if I want to sum all records with 11 as org_id, then i do a sum() where org_id=11, I don't create a new variable with a 1 for records with org_id and then sum those.

data have (drop=i);

  org_id=11;

  do i=1 to 5; output; end;

  org_id=22;

  do i=1 to 8; output; end;

  org_id=33;

  do i=1 to 10; output; end;

run;

proc sql;

  create table LOOP as

  select  distinct ORG_ID

  from    WORK.HAVE;

quit;

options missing=0;

data _null_;

  set loop end=last;

  if _n_=1 then call execute('data want; set have; ');

  call execute(' if org_id='||strip(org_id)||' then _'||strip(org_id)||'=1;');

  if last then call execute('run;');

run;

Super User
Posts: 6,948

Re: Need help to handle missing values

* create a list of distinct org_id's;

proc sort data=mmm_test (keep=org_id) out=temp nodupkey;by org_id;run;

* dynamically create code for each org_id;

data _null_;

set temp end=done;

if _n_ = 1 then do;

  call execute('data want; set mmm_test;');

end;

call execute('if org_id = ' !! strip(put(org_id,best5.)) !! ' then _' !! strip(put(org_id,best5.)) !! ' = 1; else _' !! strip(put(org_id,best5.)) !! ' = 0;');

if done then call execute('run;');

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 265

Re: Need help to handle missing values

Thanks a lot for ur great support.

Sir is it possible any solution which resolve error in my code (provided in my questiom)using my logic?

Thanks once again for ur solution.

Super User
Posts: 6,948

Re: Need help to handle missing values

Lets dissect this:

%let org_list1=11,22,33 (in main code i have created it by using proc sql and separated bby ",")

This is executed by the macro processor before the following data step starts compiling, note the italics

Data Temp;

set mmm_test;

do i =1 to 2;

if org_id= input(scan("&org_list1",1,","),12.) then

do; %let var= _%scan("&org_list1",i,",");

This macro statement is evaluated by the macro processor during the compilation phase of the data step. Since i at this moment is just text, and not a variable name, the %scan will fail, as it requires a numeric value

&var=1;

Since the %let fails, this will produce no meaningful code

end;

else do;

%let var= _%scan("&org_list1",i,",");

Will also fail, for reasons stated above; also note that both macro statements will ALWAYS execute, as the data step if-then-else only works while the data step EXECUTES, not while it is COMPILED.

&var=0;

Same as above

end;

end;

run;

Macro statements are NEVER executed while data steps (or other procs) run, but always BEFORE, as soon as macro text (& and %) is encountered. The ONLY interaction between macro and data step can be done with certain subroutines like call symput.

Thoroughly study the SAS docs regarding the macro engine, or you will repeat mistakes like that again and again.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super Contributor
Posts: 265

Re: Need help to handle missing values

Thanks and apology for my mistake.

Super User
Posts: 6,948

Re: Need help to handle missing values

No need to apologize. Read and improve, and your questions will become harder and harder for "experts" like us to solve. Which makes them more interesting.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,777

Re: Need help to handle missing values

Creating dummies is usually not necessary with SAS as the CLASS statement will handle that.  You can get SAS to create a data set of dummies with PROC TRANSREG which has a plethora of options to control how they are created and named.

data org;
   input Org_id;
   cards;
11
11
11
11
11
22
22
22
22
22
22
22
22
33
33
33
33
33
33
33
33
33
33
;;;;
   run;
proc print;
  
run;
options validvarname=v7;
proc transreg cprefix=0;
  
model class(org_id / zero=sum);
   output out=dummies design=10000;
  
run;
proc print;
  
run;
6-9-2015 8-11-49 AM.png
Valued Guide
Posts: 858

Re: Need help to handle missing values

I think this is what you are looking for.  Keep in mind if the dataset is not sorted this way it will not work:

data have;

infile cards dsd;

input Brand    $ id org_id    sales;

cards;

A1,1,111,3252

A2,2,111,2859

A3,3,111,3539

A4,4,111,3890

A5,5,111,3944

A6,6,111,3825

A7,7,111,4711

A8,8,111,3178

A9,9,111,4992

A10,10,111,4020

A11,11,111,2589

A12,12,111,1559

B 1,13,111,1949

B 2,14,,

B 3,15,,

B 4,16,,

B 5,17,,

B 6,18,,

B 7,19,,

B 8,20,,

B 9,21,,

B 10,22,111,3764

B 11,23,111,1402

B 12,24,111,1723

C1,25,111,1086

C2,26,111,4767

C3,27,111,1455

C4,28,111,3552

C5,29,111,4862

C6,30,,

C7,31,,

C8,32,,

C9,33,,

C10,34,,

C11,35,111,3482

C12,36,111,1377

D1,37,111,2803

D2,38,111,3021

D3,39,111,3074

D4,40,111,1246

D5,41,,

D6,42,,

D7,43,,

D8,44,111,3690

D9,45,111,1022

D10,46,111,4089

D11,47,111,1603

D12,48,111,3898

A1,1,222,3252

A2,2,222,2859

A3,3,222,3539

A4,4,222,3890

A5,5,222,3944

A6,6,222,3825

A7,7,222,4711

A8,8,222,3178

A9,9,,

A10,10,,

A11,11,,

A12,12,,

B 1,13,,

B 2,14,,

B 3,15,222,3539

B 4,16,222,3890

B 5,17,222,3944

B 6,18,222,3825

B 7,19,222,4711

B 8,20,222,3178

B 9,21,222,5435

B 10,22,222,3764

B 11,23,222,1402

B 12,24,222,1723

C1,25,222,1086

C2,26,222,4767

C3,27,222,1455

C4,28,222,3552

C5,29,222,4862

C6,30,222,1949

C7,31,222,2859

C8,32,222,3539

C9,33,222,3890

C10,34,222,3944

C11,35,222,3825

C12,36,222,1377

D1,37,222,2803

D2,38,222,3021

D3,39,,

D4,40,,

D5,41,,

D6,42,,

D7,43,,

D8,44,,

D9,45,,

D10,46,222,3825

D11,47,222,1603

D12,48,222,3898

A1,1,333,3252

A2,2,333,2859

A3,3,333,3539

A4,4,333,3890

A5,5,333,3944

A6,6,333,3825

A7,7,333,4711

A8,8,333,3178

A9,9,333,4992

A10,10,333,4020

A11,11,333,2589

A12,12,333,1559

B 1,13,333,1949

B 2,14,333,3890

B 3,15,333,3944

B 4,16,333,3825

B 5,17,333,4711

B 6,18,333,3178

B 7,19,333,234

B 8,20,333,5346

B 9,21,333,3566

B 10,22,,

B 11,23,,

B 12,24,,

C1,25,,

C2,26,,

C3,27,,

C4,28,,

C5,29,333,4862

C6,30,333,3898

C7,31,333,3252

C8,32,333,2859

C9,33,333,3539

C10,34,333,2345

C11,35,333,3482

C12,36,333,1377

D1,37,333,2803

D2,38,333,3021

D3,39,333,3074

D4,40,,

D5,41,,

D6,42,,

D7,43,,

D8,44,,

D9,45,333,1022

D10,46,333,4089

D11,47,333,1603

D12,48,333,3898

A1,1,444,3252

A2,2,444,2859

A3,3,444,3539

A4,4,444,3890

A5,5,444,3944

A6,6,444,3825

A7,7,444,4711

A8,8,444,3178

A9,9,444,4992

A10,10,444,4020

A11,11,444,2589

A12,12,444,1559

B 1,13,444,1949

B 2,14,444,1949

B 3,15,444,2859

B 4,16,444,3539

B 5,17,444,3890

B 6,18,444,3944

B 7,19,444,3825

B 8,20,,

B 9,21,,

B 10,22,,

B 11,23,,

B 12,24,,

C1,25,,

C2,26,,

C3,27,,

C4,28,,

C5,29,444,4862

C6,30,444,1949

C7,31,444,2859

C8,32,444,3539

C9,33,444,3890

C10,34,444,3944

C11,35,444,3825

C12,36,444,1377

D1,37,444,2803

D2,38,444,3021

D3,39,,

D4,40,,

D5,41,,

D6,42,,

D7,43,,

D8,44,444,3690

D9,45,444,1022

D10,46,444,4089

D11,47,444,1603

D12,48,444,3898

A1,1,555,3252

A2,2,555,2859

A3,3,555,3539

A4,4,555,3890

A5,5,555,3944

A6,6,555,3825

A7,7,555,4711

A8,8,555,3178

A9,9,555,4992

A10,10,555,4020

A11,11,555,2589

A12,12,555,1559

B 1,13,555,1949

B 2,14,555,

B 3,15,555,

B 4,16,555,

B 5,17,555,

B 6,18,555,

B 7,19,555,

B 8,20,555,

B 9,21,555,

B 10,22,555,3764

B 11,23,555,1402

B 12,24,555,1723

C1,25,555,1086

C2,26,555,4767

C3,27,555,1455

C4,28,555,3552

C5,29,555,4862

C6,30,555,

C7,31,555,

C8,32,555,

C9,33,555,

C10,34,555,

C11,35,555,3482

C12,36,555,1377

D1,37,555,2803

D2,38,555,3021

D3,39,555,3074

D4,40,555,1246

D5,41,555,

D6,42,555,

D7,43,555,

D8,44,555,3690

D9,45,555,1022

D10,46,555,4089

D11,47,555,1603

D12,48,555,3898

;

run;

data want(rename=(org=org_id));

retain org;

set have;

if org_id ne . then org=org_id;

drop org_id;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 540 views
  • 2 likes
  • 6 in conversation