# How to use SPSS software to analyze data for research paper

|## About SPSS and research papers

In this article learn how to use SPSS software to analyze data for research paper.

I used IBM SPSS Statistics v19 on my 64-bit Windows 8.1 operating system to analyze data for this article.

* **Note**:

**SPSS**is not freeware, and besides Windows, you can download the Linux version, as well.**PSPP**is a**free**alternative to SPSS. It can open SPSS database files (.sav), but not SPSS output files (.spo). For additional info, click here.

Inspiration for this quick start-up guide was a diploma dissertation for the Faculty of Medicine. I was hired to do the complete statistical analysis and other technical things, such as improved fancy diagrams, and other.

In this beginners’ guide you’ll learn how to create an SPSS matrix and do the most common analyses. Providing that SPSS is a complex piece of software, this is meant to be the intro on its complex features.

For this research paper, women had filled the questionnaire; gathered data was entered in the SPSS matrix and further processed to complete the study.

Topic of this research paper is related to the **contraception methods;** it’s interesting and “tickling”, so one of the goals is to mention some current real-world data.

### Define the problem and research goals

Before we start off you firstly need to define the problem and research goals in your research paper.

In this case, we are dealing with *analyzing factors that influence the frequency of the* *contraception usage*.

### Goals of this paper

1. Determine the range of the contraception usage

2. Determine which factors contribute to increased contraception usage

3. Determine if there are negative attitudes regarding contraception usage

### Research hypothesis

From the goals derived the following hypothesis:

H1- contraception range in examined population is higher than 60%

H2- the women’s age has a positive influence to contraception usage

H3- women’s and her partner’s education have a positive influence to contraception usage

H4- contraception usage frequency is larger in cities than in rural places

H5- type of the emotional relationship is the very important factor that determines the contraception usage

H6- offspring existence has a positive influence to contraception usage

H7- desired number of children has a positive influence to contraception usage

H8- frequency of sexual intercourse has a positive influence to contraception usage

H9- there are negatives attitudes to contraception usage in the examined population

H10- side effects and misapprehension related to contraceptive pills has a stronger influence to its usage than its useful effects

### A bit on research variables

We have dependent and independent variables in this research paper. Independent variables are: age, residence, education, long relationship, type of the relationship, existence of children, the number of children, and a frequency of sex. Dependent variables are the contraception usage, frequency of contraception usage, and the contraception method.

## How to form a questionnaire

As a research method, we used the anonymous questionnaire, custom built to gather data for this research.

It is a pity that I had to remove scanned image of this questionnaire due to legal infrigements (library staff of the faculty of medicine made a complaint). Therefore, this section remains empty, so it’s up to you to design your own questionnaire.

## How to form a matrix in SPSS

To form a matrix in SPSS, you need to extract gathered data from your questionnaire and type it in a spreadsheet-like table. Every question must be adequately processed and entered in a matrix. In SPSS you have two basic views: **Variable View** and **Data View**.

Note: here is the example of our matrix limited to 20 observations (I can’t attach the full version due to legal issues): DOWNLOAD

Open this matrix to see the difference between these two views. Click on *Variable View* to see variables and its type, labels, values and other properties:

Then, click on *Data View* to see the actual values of these variables:

OK, the first thing you got to do is to define parameters (or *attributes*) related to your variables. Pay attention to distinct these situations:

**single answer**question – represents one variable,

*(e.g. “Residence”)***multiple answers**question – it can’t represent one variable, so we break it down to “sub-variables”.

*(e.g. “Which contraception method do you use?” )*

Carefully examine the structure of your questionnaire to define your variables in the matrix. Variables are defined in rows, and their parameters are defined in columns.

When you define a **Name,** choose a short and intuitive name, because that is displayed in *Data View*; **Label** is the description of your variable; **Values** are of discrete numerical type; **Missing **shows you the missing values (values you forgot to type in, or questions that don’t have a (valid) answer).

In SPSS we use numerical values to represent all the answers. Don’t be confused with this, you have to transpose questions in numbers (e.g. Yes – 1, or No – 2). We cover this concept further in this article.

When assigning a number-to-answers from questionnaire, you’ll notice 4 different situations:

1. if a numerical input is expected (i.e. question 1. “Age”) – that is a piece of cake; there is no problems while assigning the numerical value because you express your age with a number;

2. if one answer is to be circled (i.e. question 2. “Residence”) – to every option from the questionnaire (answers: a, b, c, …) we will assign a numerical value. See the picture below:

First, make sure that the *Variable View *is opened. Then click on *Values, *and when a small rectangle appears, click on it.* *A new* *dialog opens (that is in the focus on the picture above) where we assign a numerical value to the specific answer (1 — city). After you assign a number to an answer, click on **Add;**

3. if multiple answers are to be circled in the questionnaire (i.e. question 12. related to the contraception method choice) – this can’t be assigned to a single number, but a multiple Yes/No variables. Every option that can be circled will be presented as a new variable. See this screenshot:

So, we break down this complex variable to multiple sub-variables that can be *Yes* or *No* (1 or 2, that is, when we assign number to these options).

**Note**: we didn’t mention *abstinence* as the only 100% successful contraception method. According to one research the most frequent contraception method is a “*coitus interruptus*” or the pull-out method. This is one of the most unreliable contraception methods, so we are not surprised to hear about abortion increase or “forced” marriages.

Theoretically speaking, if you want the **natural** contraception method, use the combination of: monitor your menstrual cycle and avoid your fertility window + abstination, when you reach the fertility window + *coitus interruptus* (the pull-out-method). We say “theoretically”, because we encourage you to use other contraception methods that are more efficient. This was a small digression, so let’s get back to the topic.

4. Answers that are based on the **Likert scale** (1-5). If you understood everything from this article, this is a rather easy one to explain. See the picture below to see how to enter values according to Likert scale:

OK, we covered this part and you can probably now form the matrix, variables and data on your own.

Now switch to **Data View**. In *Data View*, variables are already set in columns, and rows are empty and represent *observations* (one row = one observation = one filled questionnaire).

If this is not quite clear, let’s fire up an example from our matrix (to cut a long story short):

In this screenshot, we see the first observation in *Data View*; let’s analyze this data: this girl is 22 years old, she lives in town, her education is completed high school (she didn’t make any further), her partner has a faculty, this is a long-term partner, and she is in a long-lasting relationship, she has no children, (child count = 0), she is rarely having sex, and she is always using some contraception method – a *condom* or a *pill*. (of course, data is entered in reversed order; look to your questionnaire, look how you assign your numerical values to your answers and enter data in this Data view).

The process of entering all data in the matrix is really a pain in the ass, but you must enter all data in order to proceed to the statistics part.

When you enter all data, save your matrix. Note that all operations in SPSS are logged in the *Output* window; loading a matrix, saving a matrix, generating statistics… and other activities are being logged in the *Output* window. When you attempt to close the Output window, a dialog will appear, asking you whether you want to save changes. Don’t confuse this with your matrix, because an Output window isn’t related to your matrix, and it doesn’t affect your matrix in any way. Usually you don’t have to save your output window.

## Statistics in SPSS software

You’d probably start off with some basic descriptive statistics and frequencies analysis.

Select the menus as follows: **Analyze –> Descriptive Statistics –> Frequencies…**

When the **Frequencies **window appears, select variables by clicking on the right arrow:

Click on the **Charts… **button, check the **Histograms** radio button (include *normal curve* on histogram, as well):

Click on the **Continue** button to open results in the Output window. All operations are being logged in the Output window (i.e. when saving a file, loading documents, performing tests, and other):

It is important not to have any *missing values*, because we want to have complete data. Also, can be the indicator whether you entered correctly all the data you had gathered.

**N** is the sample size (number of women that filled the questionairre);

You interpret the data in your table like this: the number of 16-year-olds is 3; it’s the 10% percent of overall sample; Cumulative Percent sums all previous Percent in the column until the current row. According to this sample, the number of women aged 16-20 years is 20%. Look at the Serbian version of this article I wrote on Linux operating system, using SPSS 16, because I used all observations not only 20.

Output window shows selected chart, as well:

If you have multiple charts in the Output window here is the easy way to insert all those charts in your research paper document file: right mouse click on the empty surface of the Output window and select **Export…** :

Use any text processor (Microsoft Office Word, LibreOffice, OpenOffice) to open and edit exported document; cut or copy desired charts or other info into your paper.

You can create charts without performing any tests or statistics. Use the Graphs** **menu to easily create charts or graphs; use *Chart Builder* to customize desired info you want to show on your charts in detail. The simpler alternative is to use **Legacy Dialogs**, e.g. the Pie chart:

We are interested in summaries for groups of cases:

Define slices by *number of children* variable:

This is a basic form of the resulting pie chart:

So, in this situation, we have a “bare” chart, with no statistics. Double click on your chart to edit it, e.g. you want to add percentage, dimension, edit labels, and other.

### Chi-Square test in SPSS

*Chi-Square Test* is probably the most used test in SPSS for research papers. To perform this test, click on the *Analyze* menu and navigate to **Crosstabs…**:

When the **Crosstabs **dialog appears, select desired variables in *Row(s)* and *Column(s)* by clicking on the corresponding arrows:

Click on the **Statistics…**button; tick the following checkboxes and click on the **Continue** button:

Click on the **Cells** button to include percentage; tick the checkboxes as follows:

Results can be reviewed in the Output window:

Chi-Square tests results are shown in the second section of the output file:

The following parameters are to be included in your paper:

**Value**– the value of your statistics,**df**–*degrees of freedom*value. An “n” symbol can be assigned in various literature instead of “df”,**sig**– a*statistical**significance*; a “p” symbol can be assigned in various literature instead of “sig”.

The Chi-Square test requires no assumptions about the shape of the population distribution from which the sample was drawn. However, like all inferential techniques, it assumes random sampling. It can be applied to variables measured at a nominal and/or an ordinal level of measurement. Hypotheses are set as follows:

**Ho**: states that no association exists between the two cross-tabulated variables in the population, and therefore the variables are statistically independent (this is so-called*null hypothesis*),**Ha**: states that two variables are dependent (related) in the population (this is so-called*alternative hypothesis*).

The **level of statistical significance** is often expressed as the so-called p-value. Depending on the statistical test you have chosen, you will calculate a probability (i.e., the p-value) of observing your sample results given that the null hypothesis is true.

So, if “p” <= 0.05, our test is *statistically significant*, and the null hypothesis is rejected (**Ho**). To conclude:

- if
**“p” <= 0.05**–**Ha**hypothesis is accepted and**Ho**is rejected, - if
**“p” > 0.05**–**Ho**hypothesis is accepted and**Ha**is rejected.

To perform frequency item analysis use the identical procedure as you did with the frequency analysis:

If this article becomes popular, I will write a sequel article with further guides. I’m looking forward to seeing your comments.

Fil