Transcription

Organizing Your Data forStatistical Analysis in SPSSEdward A. Greenberg, PhDASU HEALTH SOLUTIONS DATA LABREVISED JANUARY 4, 2013

SPSS Data Sets

SPSS Data Sets

SPSS Data Sets Rows are cases or observationsColumns are variables (measurements)Up to 231-1 columns (2,147,493,647)No limit on the number of cases

Variable Types Numeric (40 character maximumlength) Dates and times (various formats) Other variations of numeric (currency,comma, scientific notation, etc.) String (32,767 maximum length)

Variable Names Variable names must be unique. Variable names may be up to 64characters in length. Names can contain letters, numbers, orspecial characters. Names must start with a letter or @, #,or .

Unit of AnalysisWhat constitutes a “case?” A person A household An organization An experimental trial

Level of Measurement NominalOrdinalIntervalRatio} Scale

Labeling Data Variable names may be short andcryptic. Variable labels can be up to 255characters. SPSS procedures display at least 40characters of variable labels. Value labels can be up to 120characters.

Order of Variables The order of variables in the SPSS datafile normally should be the same as theorder of items in the questionnaire. Use variable names that help youidentify the scale or instrument to whichthey apply.

Case Numbers Each case in an SPSS file shouldinclude a case number. Often this will be the first variable in thefile. The case number does not identify thesubject but it links the data record tothe subject’s questionnaire. Useful for correcting data entry errors

Create a Codebook When preparing to enter your data intoSPSS, prepare a codebook for the dataset. The codebook documents all of theitems to be entered in the data set:– Variable names and labels– Variable types and formats– Coded values for categorical items– Missing values

Sample CodebookVARIABLENAMETYPE & LENGTHDESCRIPTION / VARIABLE LABEL / CODED VALUE / VALUELABELCASENONUM 3Case numberCase numberSEXSTR 16. I am:M MaleF FemaleAGENUM 27. My age is:(Code actual age in years)EDUCNUM 18. What is the highest level of education that you have completed?Education level1 No formal education2 Some grade school3 Completed grade school4 Some high school5 Completed high school6 Some college7 Completed college8 Some graduate work9 A graduate degree

Missing DataData may be missing for several reasons: Don’t know Refused to answer Not applicable Skipped a question Instrument problem Data entry omission

Missing ValuesSPSS provides several ways ofdesignating numeric data as “missingvalues.” A blank cell is treated as “systemmissing,” represented by a dot (“.”) inthe SPSS Data Editor. Specific values can be declared as“user missing” values.

Missing Values Up to three “user missing” values canbe declared for a variable. Or, a range of values plus oneadditional value can be declared to bemissing.

Missing Values

Missing ValuesIn this example, variable AGEWED hasthree labeled values that are to be treatedas missing

Missing ValuesThe three values are declared to bemissing in the Missing Values dialog.

Missing Values Expressions handle missing values indifferent ways. The result of (var1 var2 var3)/3 ismissing if any of the three variables ismissing. The result of MEAN(var1, var2, var3) ismissing if all three of the variables aremissing.

Missing Values in ProceduresThe FREQUENCIES procedure excludescases with missing values from computations.

Multiple Responses Multiple-response items are questions thatcan have more than one value for eachcase. Two ways of coding:– For each response, a variable can have oneof two values e.g., 1 Yes and 2 No (“multipledichotomy” method)– Create a series of variables for 1st choice, 2ndchoice, etc. (“multiple categories” method)

MULT RESPONSE Procedure In the MULT RESPONSE procedure,multiple response variables arecombines into groups. The MULT RESPONSE procedurecounts responses in multiple responsegroups in frequency or cross tabulartables. Total percentages of responsesgenerally will exceed 100%.

Repeated Measures Data that are recorded on more thanone occasion for each subject Some procedures, such as GLM,require that all measurements for acase be on the same data record. Other procedures, such as the MIXEDprocedure, may expect one data recordper occasion.

Repeated MeasuresOne data record per subject, one variable peroccasion on which it is measured

Repeated MeasuresOne data record per occasion per subject

Repeated MeasuresThe good news is that SPSS allows youto easily restructure a data set Restructure selected variables intocases Restructure selected cases intovariables Transpose all data

Data entry omission . Missing Values SPSS provides several ways of designating numeric data as “missing values.” A blank cell is treated as “system missing,” represented by a dot (“.”) in the SPSS Data Editor. Specific values can be declared as “user missin