ARCHIVED: In SAS, how do I read data from a compressed or ZIP file?
Sometimes data sets are compressed or archived in ZIP files. You can read such a file format directly from SAS without having to decompress the file first. SAS will first unzip the file and then input the data set through the SASZIPAM engine:
FILENAME ZIPFILE SASZIPAM 'C:\Temp\filename.zip'; DATA newdata; INFILE ZIPFILE(dataset.csv) DLM=',' FIRSTOBS=2; INPUT var1 $ var2 $ var3 $ var4; RUN;
The FILENAME
statement specifies the type of file you
wish to unzip (e.g., ZIPFILE
), the engine
SASZIPAM
to be used to decompress the file, and the
directory and name of the file to be unzipped (e.g.,
'C:\Temp\filename.zip'
). The DATA
statement
names the data set that will be read in (e.g.,
newdata
).
The INFILE
statement gives information about the data set
within the ZIP file. First, ZIPFILE
indicates which file
should be unzipped (e.g., dataset.csv
), since there may
be multiple files in filename.zip
. Second,
DLM
signals which character is used as a delimiter in the
data set. Finally, FIRSTOBS
indicates that the first line
in the data set contains the variable names, and that therefore SAS
will begin reading the second line as data. In the INPUT
statement, indicate which variable(s) should be transferred to the new
data set.
Note: Large compressed data files (e.g.,
.txt
and .csv
formats) are not suited for
use with the SASZIPAM engine because they will employ all computer
memory and considerably delay all processes. Moreover, the SASZIPAM
engine does not decompress all zip files (e.g., zip files created by
7-zip are not compatible). Consider first decompressing the file, and
then compressing it with WinZIP, which is compatible with
SASZIPAM.
SASZIPAM is available on SAS 9. In addition to data files, you may also use SASZIPAM to unzip log files.
If you have questions about using statistical and mathematical software at Indiana University, contact the UITS Research Applications and Deep Learning team.
This is document azva in the Knowledge Base.
Last modified on 2023-05-09 14:38:09.