BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SixUnder
Obsidian | Level 7

Ok, now im onto another problem, and this one does not get any easier adding in do loops. Of which i have never done in SAS, so this should be fun. My last problem, i ended up solving myself in the end, with lots of suggestions and snippets i used from everyone contributing on here, so i figured lets share another one.

 

I have a .csv comma delimited file, randomly generated with a 20 list of wage ranges starting from $1-$2,499, $2,500-$4,999, on up to $100,000 and over.

 

I have to write a SAS® program to access the Microsoft® Excel® spreadsheet and convert the income range labels into two numerical variables for male and female data that worked full-time and have earnings. You can ignore the Other category in each gender data. As an example, the label $1 to $2,499 or loss would be stored in two variables with content: 1 and 2499.

 

first glance,

proc format;
value sex
1 = `male';
2 = 'female';
run;

Use a DO loop to process the labels into numerical variables.

first glance, do i want to make a array?

data salary;
set ?
array ? (20)

do i = 1 to 20;
not sure on calculation from label to variable
end;
run;

 

When creating and processing the variables, maintain the gender identification of the data so that the information about male and female earning categories is preserved. Given 20 earning labels per gender category, you should have 80 variables when finished, 40 per gender category. When converting the label $100,000 or more use numbers: 100000 and 149999.

 

 

Save the converted earning category data in a raw data file.

 

Going to be a long week....

1 ACCEPTED SOLUTION

Accepted Solutions
SixUnder
Obsidian | Level 7

My code and format was correct, however i had to modify the .csv file.

It needed to read from left to right.

Gender                                 Wages                                Population

20 female entries                 $1-2499 or loss                     7

20 male entries                    $2500-$4999                        0

 

All the way down, the 40 entries were then split in half top end and bottom end. The following code resolved it. I always do something silly...., i just had to change my .csv file and this thing worked like a charm.

 

Data scan;
*call routine, Returns the position and length of the nth word from a character string. ;
*creation of data step in week 4;
infile '.csv' DSD firstobs=2;
*location of file for analysis, tells SAS to start reading at line 2;
Length Gender $7;
*character to list in length for gender, male or female;
Length Earnings $20;
*character to list in length for amounts earned, high and low;
Length Population 7.0;
*total population in that wage range, up to 7 digits;
Input Gender $ Earnings $ Population;
*framework of table, the three headings;
Low_wages=scan(Earnings,1,' ');
*SAS Institute. (2017).  SCAN Function. Retrieved from http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm ;
High_Wages=Scan(Earnings,3,' ');
*SAS Institute. (2017).  SCAN Function. Retrieved from http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm ;
If High_Wages='more' then High_Wages='149999';
*If, then, else;
*SAS Institue. (2017). SAS Operators In Expressions. Retrieved from http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000780367.htm;
*beginning of the DO loop, set to 40 original readings, with a second 40 to follow;
Data DoLoop;

set scan;
*scan function;
Do i= 1 to 40;
*repeats and reads the entries for 40 values;
Low_wages3=input(Low_Wages, dollar7.0);
*input of low wage end in dollar format up to 7 places;
End;
*Executes statements between the DO and END statements repetitively, based on the value of an index variable;
Do i= 1 to 40;
*repeats and reads the entries for 40 values;
High_Wages3=input(High_Wages, dollar7.0); 
*input of low wage end in dollar format up to 7 places;
End;
Proc print;
*provids the results printable to a exportable raw file;
Title 'DO Loop';
*name of file;
Var Gender Low_Wages3 High_Wages3 Population;
*Column heading names;
run;

 

 

View solution in original post

10 REPLIES 10
art297
Opal | Level 21

No, you don't need an array:

 

data have;
  informat range $30.;
  input range &;
  low=input(scan(range,1,"to"),comma8.);
  high=input(scan(range,2,"to"),comma8.);
  cards;
$2,500 to $4,999
$1 to $2,499
;

Art, CEO, AnalystFinder.com

SixUnder
Obsidian | Level 7

first does this work just for one of the 80 entries? Or do i have to write this for everyone? I am suppose to include a "DO Loop" in this program is why i am asking.

 

data have;
  informat range $30.;
  input range &;
  low=input(scan(range,1,"to"),comma8.);
  high=input(scan(range,2,"to"),comma8.);
  cards;
$2,500 to $4,999
$1 to $2,499
;

Also, could i use this to sort the 80 entries? or how would you do it?

 

proc sql;
select sum(nobs) as TotalRows
from 'file.csv'.tables
where libname eq 'census';
quit;

Though I am not sure if this is even written correctly.

 

 

art297
Opal | Level 21

It would work, as is, on all of your records. To sort, just use proc sort.

 

Art, CEO, AnalystFinder.com

 

SixUnder
Obsidian | Level 7
data have;
  informat range $30.;
  input range &;
  low=input(scan(range,1,"to"),comma8.);
  high=input(scan(range,2,"to"),comma8.);
  cards;
$2,500 to $4,999
$1 to $2,499
;

This will convert the 80 entries and put them in 2 columns with lables? Ok. I guess its on to the sorting then.

Tom
Super User Tom
Super User

Why the heck would you want 80 different variables?  Why can't you just leave the variables as Gender and Salary_Range and have multiple observations to indicate the different combinations?


@SixUnder wrote:

When creating and processing the variables, maintain the gender identification of the data so that the information about male and female earning categories is preserved. Given 20 earning labels per gender category, you should have 80 variables when finished, 40 per gender category. When converting the label $100,000 or more use numbers: 100000 and 149999.

 

SixUnder
Obsidian | Level 7

Tom,

 

That is part of the reason i questioned why or how that code works for this problem i have been give. If you look at the question, and features of what I have been asked to complete, I am still unsure if i even have the .csv file correctly formatted. Everytime i read over it, i think of something else, it may or could look like. I am attaching my current .csv file to show the data I am suppose to have "80 entries total for".

 

Here is the portion of the discussion I am missing out on. This should be a small simple program right?

 

Myabe what Tom says is ok, but i still dont know how to sepereate that in the code. The more i research this problem the more confused i get.

 

Write a SAS® program to access the Microsoft® Excel® spreadsheet and convert the income range labels into two numerical variables for male and female data that worked full-time and have earnings. You can ignore the Other category in each gender data. As an example, the label $1 to $2,499 or loss would be stored in two variables with content: 1 and 2499.

When creating and processing the variables, maintain the gender identification of the data so that the information about male and female earning categories is preserved. Given 20 earning labels per gender category, you should have 80 variables when finished, 40 per gender category. When converting the label $100,000 or more use numbers: 100000 and 149999.

 

Save the converted earning category data in a raw data file.

 

Sometimes the best way for me to solve a problem is with color, and normally this would make things clearer, but for me its not working. Most of it has to do with the following...

 

converting the ranges..

formatting the .csv properly

storing the data in a raw file

 

I think i understand the following will informat and modify the ranges, but I still feel depleted like im missing something. And its the first time i have been asked to submit a raw data file, that is just the output i would assume exported from SAS. I just need to tie all this together, I only have to complete one more program after this one, I wish i knew as much as some of you do.

 

data Auglaize;
infile '.csv';
data have; informat range $30.; input range &; low=input(scan(range,1,"to"),comma8.); high=input(scan(range,2,"to"),comma8.); cards; $2,500 to $2,499
$1 to $2,;
*what about every other range in the table?;
*I am suppose to have 80 variables if i am reading that correctly;

Proc Print Data=Auglaize;
*specify the data set to input from;
run;

This is as close as i have came to get it working, can anyone help me with the kinks?

 

I am getting a error about there being no columns, and its referencing a table I am not even trying to use. Its referencing a file named completely different than the one I am using.

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 data Auglaize;
57 infile '/home/ktylu10/Auglaize County Ohio Jason.csv';
 
NOTE: The infile '/home/ktylu10/Auglaize County Ohio Jason.csv' is:
Filename=/home/ktylu10/Auglaize County Ohio Jason.csv,
Owner Name=ktylu10,Group Name=oda,
Access Permission=-rw-rw-r--,
Last Modified=26Feb2017:20:46:45,
File Size (bytes)=1695
 
NOTE: 0 records were read from the infile '/home/ktylu10/Auglaize County Ohio Jason.csv'.
NOTE: The data set WORK.AUGLAIZE has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 763.93k
OS Memory 27808.00k
Timestamp 02/27/2017 01:53:06 AM
Step Count 24 Switch Count 44
Page Faults 0
Page Reclaims 344
Page Swaps 0
Voluntary Context Switches 125
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 160
 
58 data have;
 
59
60 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

 

Tom
Super User Tom
Super User

You need to ask the professor what the assignment means. You have posted a CSV file, but originally you said you had a Excel file.

If you have an Excel file you can just import it into SAS.  So if you have an XLSX file named 'myfile.xlsx' then this code will create a table with 40 observations and three variables. 

proc import replace out=given dbms=xlsx 
  datafile='myfile.xlsx'
;
run;

If you really have CSV file then why not just read it directly. That way you can control the variable name that it uses instead of SAS trying to conver the column header into a variable name. The first column has a header that would make a bad variable name.

data given;
  length sex_plus_range $50 population date 8 ;
  informat population comma. date mmddyy. ;
  format date yymmdd10. ;
  infile 'myfile.csv' dsd firstobs=2 ;
  input sex_plus_range population date ;
run;

You could then use the dataset created above and try to split the first column into nicer values like in the early replies have shown.

data nicer ;
  length sex $6 age_range $40 population date 8 ;
  set given;
  sex = scan(sex_plus_range,1);
  age_range = substr(sex_plus_range,length(sex)+2);
run;

But perhaps the question is to construct an easier way to read those values?  That could be done using DO loop.

data nicer ;
  infile 'myfile.csv' dsd firstobs=2 ;
  length sex $6 age_range $40 population date 8;
  informat population comma. date mmddyy. ;
  format date yymmdd10. ;
  do sex='Male','Female';
    do age_range ="$1 to $2,499 or less"
                 ,"$2,500 to $4,999"
                 ,"$5,000 to $7,499"
                 ,"$7,500 to $9,999"
                 ,"$10,000 to $12,499"
                 ,"$12,500 to $14,999"
                 ,"$15,000 to $17,499"
                 ,"$17,500 to $19,999"
                 ,"$20,000 to $22,499"
                 ,"$22,500 to $24,999"
                 ,"$25,000 to $29,999"
                 ,"$30,000 to $34,999"
                 ,"$35,000 to $39,999"
                 ,"$40,000 to $44,999"
                 ,"$45,000 to $49,999"
                 ,"$50,000 to $54,999"
                 ,"$55,000 to $64,999"
                 ,"$65,000 to $74,999"
                 ,"$75,000 to $99,999"
                 ,"$100,000 or more"
     ;
       input skip $ population date ;
       drop skip;
       output;
    end;
  end;
run;

 

I am still at a loss for what they mean about "80 variables". You really only have 40 population data values, so perhaps the question is to convert those 40 observations into a dataset with one observation and 80 variables?  What are the other 40 variables supposed to be?  Do they want you to make 40 copies of the constant Date value?

 

What names do they want for these variables?  If they don't care then why not just name them POP1 to POP40?

data middle ;
  set given ;
  id+ 1;
run;
proc transpose data=middle out=want(drop=_:) prefix=POP;
  id id ;
  idlabel sex_plus_range ;
  var population;
run;

 

SixUnder
Obsidian | Level 7

Ok, I think i have this how i want it, but my data is not showing up in the proper columns. I was able to get soem clarification on the data for our table. I am attaching it, in addition, i have the code here but for some reason it is not displaying the data as i need.

 

Instead of this...

O b s                 G e n d e r                               Low_Wages 2                                  High_Wages 2                  P o p u l a t i o n
1                               M a l e                                    1                                                                   2 4 9 9                                 21
2                               M a l e                                  2 5 0 0                                                             4 9 9 9                               7
3                               M a l e                                  5 0 0 0                                                             7 4 9 9                                 38
4                               M a l e                                  7 5 0 0                                                             9 9 9 9                                33

 

I am getting this....as results...and it is wrong, of course.

 

O b s                 G e n d e r                               Low_Wages 2                                  High_Wages 2                  P o p u l a t i on
1                               M a l e                                    1                                                                                                            2499
2                               M a l e                                  2 5 0 0                                                                                                     4999
3                               M a l e                                  5 0 0 0                                                                                                     7499
4                               M a l e                                  7 5 0 0                                                                                                     9999

 

here is where my code currently sits.. I know its because of how time i have put into this, i am missing something. Can anyone see my error?

 

Data Scan;

infile '.csv' DSD firstobs=2;

Length Gender $6;

Length Earnings $19;

Length Population 7.0;

Input Gender $ Earnings $ Population;

Low_wages=scan(Earnings,1,' ');

High_Wages=Scan(Earnings,3,' ');

If High_Wages='more' then High_Wages='149999';

 

/*Do Loop*/

Data DOLoop;

set Scan;

Do i= 1 to 40;

Low_wages2=input(Low_Wages, dollar7.0);

End;

Do i= 1 to 40;

High_Wages2=input(High_Wages, dollar7.0); 

End;

 

Proc print;

Title 'Week 4 Do Loop';

Var Gender Low_Wages2 High_Wages2 Population;

run;

 

 

 

 

SixUnder
Obsidian | Level 7
Nm, i found my mistake. I will post it shortly.
SixUnder
Obsidian | Level 7

My code and format was correct, however i had to modify the .csv file.

It needed to read from left to right.

Gender                                 Wages                                Population

20 female entries                 $1-2499 or loss                     7

20 male entries                    $2500-$4999                        0

 

All the way down, the 40 entries were then split in half top end and bottom end. The following code resolved it. I always do something silly...., i just had to change my .csv file and this thing worked like a charm.

 

Data scan;
*call routine, Returns the position and length of the nth word from a character string. ;
*creation of data step in week 4;
infile '.csv' DSD firstobs=2;
*location of file for analysis, tells SAS to start reading at line 2;
Length Gender $7;
*character to list in length for gender, male or female;
Length Earnings $20;
*character to list in length for amounts earned, high and low;
Length Population 7.0;
*total population in that wage range, up to 7 digits;
Input Gender $ Earnings $ Population;
*framework of table, the three headings;
Low_wages=scan(Earnings,1,' ');
*SAS Institute. (2017).  SCAN Function. Retrieved from http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm ;
High_Wages=Scan(Earnings,3,' ');
*SAS Institute. (2017).  SCAN Function. Retrieved from http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214639.htm ;
If High_Wages='more' then High_Wages='149999';
*If, then, else;
*SAS Institue. (2017). SAS Operators In Expressions. Retrieved from http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000780367.htm;
*beginning of the DO loop, set to 40 original readings, with a second 40 to follow;
Data DoLoop;

set scan;
*scan function;
Do i= 1 to 40;
*repeats and reads the entries for 40 values;
Low_wages3=input(Low_Wages, dollar7.0);
*input of low wage end in dollar format up to 7 places;
End;
*Executes statements between the DO and END statements repetitively, based on the value of an index variable;
Do i= 1 to 40;
*repeats and reads the entries for 40 values;
High_Wages3=input(High_Wages, dollar7.0); 
*input of low wage end in dollar format up to 7 places;
End;
Proc print;
*provids the results printable to a exportable raw file;
Title 'DO Loop';
*name of file;
Var Gender Low_Wages3 High_Wages3 Population;
*Column heading names;
run;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1983 views
  • 0 likes
  • 3 in conversation