proc import datafile="c:/data.csv" out=datain replace dbms=csv;
getnames=no;
datarow=4;
guessingrows=40;
run;
data need (drop=var2 var3);
set datain (rename=(var1=geo_id));
if substr(var2,1,89)='Estimate; Female: - Worked full-time, year-round in the past 12 months: - With income: -';
if substr(reverse(strip(var2)),1,7) eq 'erom ro' then salary=100000;
else if substr(reverse(strip(var2)),1,7) eq 'ssol ro' then salary=2499;
else salary=input(substr(var2,index(var2,'to ')+4),comma8.);
count=input(var3,8.);
run;
proc report data = datfile.dat NOWINDOWS HEADLINE;
title 'Auglaize County sorted Data';
column Wages Full time Part time Population;
run;
LIBNAME datafile "c file location";
proc report data = datafile.dat NOWINDOWS;
title 'Auglaize County sorted Data';
run;
PROC SORT DATA=auto OUT=auto2 ;
BY foreign ;
RUN ;
PROC PRINT DATA=auto2 ;
RUN ;
@SixUnder wrote:I am going to try and create the data step, the input and possibly the variables and go from there. I will come back here and post once i get that far. Whos knows, you are all much more advanced than I am.
close as i have been so far..
I think your program looks pretty straightforward and clean.
One comment - on the use of the REVERSE function, which you are using to get substrings on the right-hand side of some text. You can take advantage of the LENGTH function instead, which will make their intention more apparent:
I.e. instead of
if=substr(reverse(strip(var2)),1,7) eq 'erom ro' then salary=100000;
else if substr(reverse(strip(var2)),1,7) eq 'ssol ro' then salary=2499;
else salary=input(substr(var2,index(var2,'to ')+4),comma8.);
you can use
if substr(salary,length(salary)-6)='or more' then salary=100000;
else if substr(salary,length(salary)-6)='or less' then salary=2499;
else salary=input(substr(var2,index(var2,'to ')+4),comma8.);
And the third line can be changed to:
else salary=input(scan(vars,3,' '),comma8.);
which uses the SCAN function to retrieve the 3rd word of VAR2. No need to find the letters "to" and then add 4.
Yes, you're quite close, but your code has some likely errors. e.g.,
proc report data = datfile.dat NOWINDOWS HEADLINE;
You will be specifying the SAS dataset set that you created. Thus it should be more like:
proc report data = need NOWINDOWS HEADLINE;
Same thing for your other proc report. And, in your proc sort, where did data 'auto' come from?
Finally, you mentioned that your specs called for you to build a report based on a sample of 20 records. That can be accomplished with a datastep like:
data subset; SampleSize = 20; Left = numberObs; do i = 1 to numberObs; set need point = i nobs = numberObs; if uniform(0) le SampleSize/Left then do; /* select the observations */ SampleSize + (-1); /* decrement needed by 1 */ output; end; /* select the observations */ Left +(-1); /* decrement remaining obs by 1 */ end; stop; run;
HTH,
Art, CEO, AnalystFinder.com
What do you think of this? Any advise would be appreciated. Thanks. So sad on my part, i still have 3 more of these to do in the next month
data subset; SampleSize = 20; Left = numberObs; do i = 1 to numberObs; set need point = i nobs = numberObs; if uniform(0) le SampleSize/Left then do; /* select the observations */ SampleSize + (-1); /* decrement needed by 1 */ output; end; /* select the observations */ Left +(-1); /* decrement remaining obs by 1 */ end; stop; run;
*Identify SAS Library folder?
LIBNAME wk3 '<path>'?
*Import xls file to an SAS Dataset?
PROC IMPORT DATAFILE='<path>/<file>.xls' DBMS= XLS OUT=wk3.MedPop1 REPLACE?
RUN?
/*Create a dataset based off the original xls, but limit the data to the Female/ Full Time Population */
Data MedFem?
SET MedPop1 ?
Where Sex = 'F' AND Status ='FT'?
RUN?
/*Creates a raw 'txt' file to hold the data */
DATA _NULL_?
SET MedFem?
FILE '<path>/<file>.txt' PRINT?
PUT SEX $ POPULATION Salary_Low Salary_High?
RUN?
/* Sorts the data by Salary_High in descending order (highest to lowest salary */
PROC SORT data = MedFem?
BY DESCENDING Salary_High?
RUN?
/*Creates a raw 'txt' file to hold the data */
DATA _NULL_?
SET MedFem?
FILE '<path>/<file>.txt' PRINT?
PUT SEX $ POPULATION Salary_Low Salary_High?
RUN?
/* Sorts the data from Highest to lowest population #'s per salary */
PROC SORT data = MedFem?
BY DESCENDING Population?
RUN?
/*Creates a raw 'txt' file to hold the data */
DATA _NULL_?
SET MedFem?
FILE '<path>/<file>.txt' PRINT?
PUT SEX $ POPULATION Salary_Low Salary_High?
RUN?
A couple of notes on the program as you have constructed it:
Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Wages_Low Wages_High Estimated Max_Earnings;
Proc Print Data=Calculation;
run;
then a program for sort 2
Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Wages_Low Wages_High Estimated Max_Earnings;
Proc Sort Data=Calculation Out=Sort2;
By Descending Max_Earnings;
Run;
and sort 3...
Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Salary_Low Salary_High Estimate Max_Earnings;
Proc Sort Data=Calculation Out=Sort3;
Key Estimated / descending;
Run;
Would that make 3 tables, showing the requested information?
If i make my excel file like this...
Gender- Salary Low End- Salary High End- Esitmates- Maximum Earnings
female 1 1 1 1
female 5 7 6 8
etc. down to the 20 requested
Will that work?
Boom!, my work did it! Now only 3 more this month, haha!
Why did you import the data 3 times?
It is actually 3 different programs, i just didnt separate them in the code. My apologies, on to the next problem!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.