06-12-2017 07:35 PM
I've read in some data from a CSV, and part of it is incorrectly aligned (I think values from the first 21 columns are deleted when it's extracted from a dashboard, for some reason). More specifically, a block of the observations are being read in as though the value for the 22nd variable is actually the value for the 1st variable, etc. (the values for variables 1-21 have been deleted somewhere along the line). Is there any way that I can shift those values back to their correct placement in columns 22 onward? Obviously, I would prefer to use as much of the data as possible. So, I'm thinking I'd like to write syntax that does something like (region is column b):
if region not in (1,2,3,4,5,6,7,8,9,10,11,12,13) then [SOMEHOW-SHIFT-RIGHT-21-COLUMNS];
Any suggestions as to what goes in the bracketed space?
I've included a screenshot to better explain the problem with the data.
Thanks in advance!!!
06-12-2017 08:50 PM
Is the shifting in the original CSV file or only in the sas dataset, as the result of importing ?
What code did you use to import the csv file ?
Generally I would use arrays and DO loops to shift data, but it seems that some of the columns are numeric
while others are alphanumeric, and you cannot define array of mixed data types.
06-12-2017 10:36 PM
Good question. The issue with the data exists prior to importing it into SAS and, unfortunately, the data source is resistant to change.
I would be more than open to treating all of the variables as character rather than numeric. Ideas?
06-12-2017 09:34 PM
I suggest fixing this upstream, ie when you export/import into SAS data.
If you imported a CSV or text file, make sure your import specifies TRUNCOVER to avoid these issues. The default isn't usually what people want.
06-12-2017 10:40 PM
Unfortunately, changing the data upstream isn't an option. This is how I receive it.
I use missover, not truncover. If I switch to truncover, it throws off even more of the columns.
06-12-2017 10:50 PM
Can you attach a sample of the original CSV and how it should be read in/look?
Or mock up a representative sample.
Working off theory is slower, if you post some data it makes it easier for us to help you.
06-13-2017 11:30 AM
Just as a reminder, this is how the data is delivered to me. I REALLY wish I could change the way it's compiled upstream, but I can't.
Attached is a sample of records. There are 3 types of records:
Because of the shifted values, I might have to treat all variables as character variables and then reformat them later. That's totally okay, as long as I get the data in the right columns.
06-13-2017 02:11 AM
Something like this?
data _null_; file "%sysfunc(pathname(work))\t.txt"; put '0' @; do i=1 to 29; put ',1' @; end; put ; put 'anniv' @; do i=1 to 9; put ',2' @; end; put ; run; data WANT; infile "%sysfunc(pathname(work))\t.txt" dsd dlm=','; input @; FIRSTWORD=scan(_infile_,1,','); if ^notdigit(compress(FIRSTWORD)) then input a1-a20 @; input a21-a30 ; run;
Modify to suit your data.
06-13-2017 11:33 AM
06-13-2017 02:53 PM
You may start with attached code, demostrating shifting of variables.
Anyway I see some issues:
- shifted variables shall be missing ( if var1 is shiofted to var10 then all var1-var9 will be missing)
- shifting can overide existing values in target variable
- in my code I checked for _N_, the observation number. You may need change it and adapt to other rules.
- finally, you may need to convert some variables from character type to numeric type
filename sample '/folders/myshortcuts/My_Folders/flat/move_block_data.csv'; /* step (1) assuming first rows are full and ok, shifting no needed */ /* use proc import to get variables names as one row data */ options obs=1; proc import datafile=sample dbms=dlm out=sample replace; delimiter='|'; getnames=no; run; /* step (2) create a macro variable with list of data variables */ data _NULL_; set sample end=eof; length varlist $1000; retain varlist; array vx $ _all_; do i=1 to dim(vx)-1; varx = translate(strip(vx(i)),'_',' '); len = length(varx); put i= len= varx=; varlist = strip(varlist) ||' '|| varx; end; if eof then call symput('vars',strip(varlist)); run; %put &vars;
/* step (3) import data and shift according to rules */ options obs=max firstobs=2; data test; format &vars $40.; /* adapt to maximum var length */ infile sample dlm='|' dsd missover; input &vars; array vx $ &vars; /* shifting data */ if 5 le _N_ le 12 then do; do i=dim(vx) to 22 by -1; j = i-22+1; vx(i) = vx(j); vx(j) = ' '; end; end; run;
06-13-2017 03:23 PM
Wow! Thank you so much! I'll give this a shot! Missingness and conversion are okay, and I was wondering about overriding values.
06-13-2017 08:46 PM
Same thing really no?
data _null_; file "%sysfunc(pathname(work))\t.txt" lrecl=800; put 'Provider ID|Facility Last Name|Region|Provider Referral Indicator|Street Number Facility|Address 1 Facility|Address 2 Facility|City Facility|State Code Facility|Postal Code Facility|Street Number Mailing|Address 1 Mailing|Address 2 Mailing|City Mailing|State Code Mailing|Postal Code Mailing|Telecom Number|Provider Schools List|License Months Operation|License Days Operation|Provider Total Capacity|License Restrictions|Provider County Name|Provider Type Report Label|Start Time|End Time|Infants Capacity|Toddlers Capacity|Preschool Capacity|Schoolage Capacity|Special Needs Capacity|Overnight Capacity|Operator Party ID|Operator Name|Provider Artifact Preference|License Number|Initial Operating Date'; put '1|Child Care Center A|1|0|1|Main St||City|ST|12345|1|Main St||City|ST|12345|1234567890|Main St Elementary|January to December|Monday to Friday|63|NO MORE THAN 8 CHILDREN UNDER 30 MONTHS OF AGE MAY BE PRESENT AT ONE TIME. YOUR ANNIVERSARY MONTH IS APRIL.|County1|Licensed Child Care Center|6:30:00|18:00:00|0|0|0|0|0|0|123456||4|123456|5/21/2003 0:00 '; put '2|Child Care Center B|1|0|2|Main St||City|ST|12345|2|Main St||City|ST|12345|1234567890||January to December|Monday to Friday|117|ANNIVERSARY MONTH IS AUGUST. |County2|Licensed Child Care Center|7:00:00|18:00:00|6|12|59|40|0|0|234567||4|234567|8/30/2002 0:00 '; put '3|Child Care Center C|1|0|3|Main St||City|ST|12345|3|Main St||City|ST|12345|1234567890||January to December|Monday to Friday|117|ANNIVERSARY MONTH IS AUGUST. |County3|Licensed Child Care Center|7:00:00|18:00:00|6|12|59|40|0|0|345678||4|345678|8/30/2002 0:00 '; put '4|Child Care Center D|1|0|4|Main St||City|ST|12345|4|Main St||City|ST|12345|1234567890|Main St Elementary|August to June|Monday to Friday|90|Studio Capacity (45) and Party Room Capacity (33) Approved as alternate space. Center may operate from 12:30 to 6:15 pm only when school dismissed early; Outside play restricted to fenced area; Center may operate some full days when school is closed. Anniversary Month is February.|County4|Licensed Child Care Center|7:00:00|18:30:00|0|0|0|90|0|0|456789||4|456789|3/3/2004 0:00'; put '" *** Anniversary month is September ***"""|County5|Licensed Child Care Center|6:00:00|18:00:00|12|9|22|0|0|0|567890||4|567890|10/26/2012 0:00||||||||||||||||||||| '; put '" ** Anniversary month is August** """|County6|Licensed Child Care Center|7:30:00|17:30:00|0|0|19|0|0|0|678901|||678901|9/18/2014 0:00||||||||||||||||||||| '; put '" """|County7|Licensed Child Care Center|7:00:00|18:00:00|0|0|0|30|0|0|789012||4|789012|4/1/1987 0:00||||||||||||||||||||| '; put '" """|County8|Licensed Child Care Center|6:30:00|18:30:00|66|6|118|0|0|0|890123||4|890123|6/7/1996 0:00||||||||||||||||||||| ' put '|County9|Licensed Child Care Center|7:00:00|5:30:00|3|3|17|5|0|0|901234|||901234|3/10/2009 0:00||||||||||||||||||||| '; put '|County10|Licensed Child Care Center|6:00:00|23:00:00|3|3|6|0|0|0|112345||4|112345|1/17/2003 0:00||||||||||||||||||||| '; put '|County11|Licensed Child Care Center|6:00:00|23:30:00|2|0|6|4|0|0|223456||4|223456|10/15/1998 0:00||||||||||||||||||||| '; put '|County12|Licensed Child Care Center|6:30:00|18:00:00|18|9|52|0|0|0|334567||4|334567|8/23/2001 0:00|||||||||||||||||||||'; run; data WANT; infile "%sysfunc(pathname(work))\t.txt" dsd dlm='|' lrecl=800 firstobs=2 pad missover; input @; FIRSTWORD=scan(_infile_,1,'|'); if ^notdigit(compress(FIRSTWORD)) then input (a1-a21) (: $200.) @; input (a22-a40) (: $200.) ; run;
You just have to customise the input statements.
06-13-2017 10:59 PM
Thanks! Unfortunately there are several hundred of those records. I'll definitely keep this in my hat for smaller issues!
06-14-2017 12:05 AM
@ChrisNZ - I'm just suggesting that customizing the "put" statement for several hundred records could take more time than I have to spend.