Organizing, cleaning, and normalizing (smoothing) cDNA microarray data

All product names are given as examples only and they are not endorsed by the USDA or the University of Illinois.

INTRODUCTION

Click here for use SAS to analyze normalized cDNA microarray data

DEAD LINK on real website, cannot find file; Also check for grammar

Click here for an example of a nicely scanned slide

GenePix Pro

November 3, 2009lly obvious and one can flag those spots

CHECK LINK FOR GRAMMAR - something not right

ON CLOUGHLAB WEBSITE, THIS LINK POINTS TO *THIS* PAGE

sjclough@illinois.edu

DOWNLOAD SOFTWARE AND FILES NEEDEDNovember 3, 2009November 3, 2009November 3, 2009 normalization. The normalization process is done using R/maanova. Click on each of the following PERL program names to download them. SOMETHING NOT RIGHT WITH THIS HEADER, ERROR ON CLOUGHLAB WEBSITE

merge_imputeblank1.pl

This program collects data from the spot finding output file and does pre-normalization processing and filtering. The program was written assuming one is using GenePix .gpr data files. If you use another spot finding program, you might need to modify the program code or column headers. The program is written to find columns by name: "F635 Median", "F532 Median", and "Flag". Program written by Min Li.

changeflagandbadspot.pl

This program deletes flagged or weak spots (lower than the negative control) and replaces that cell in the table with a period as SAS recognizes a "." as a missing value and takes this into account in its statistical calculations. For example, if 1 spot out of 4 reps is bad and has a "." instead of a value, SAS will conduct its calculations for that gene based on 3 reps, not 4. Program written by Min Li.

calculatingAverageIntensity.pl

This program provides a table of average intensities for each spot across all slides for a given RNA sample. The normalized average intensities are calculated after the normalization step and after removing flags and weak spots. In addition to providing intensity values, the program also tells how many slides had a valid value for each spot. Program written by Min Li.

PRE-NORMALIZATION PROCESSING AND FILTERING

1. CREATE DESIGN FILE

Click here to download the demo DesignFile.txt

Click here for more information on Experimental design

In this experiment, RNA was extracted from 7 samples:
Sample 1: Time T0
Sample 2: Time T3 mock inoculated
Sample 3: Time T3 pathogen inoculated
Sample 4: Time T2 pathogen inoculated
Sample 5: Time T2 mock inoculated
Sample 6: Time T1 mock inoculated
Sample 7: Time T1 pathogen inoculated

The samples were compared on a set of 7 slides or arrays (depicted as arrows, labelled A1-A7) following a loop design.

The samples at the pointed end of the arrows were labelled with Cy5 and the samples at the other end with Cy3 dye.

Design table describing the above experiment:

Array	Dye	Sample
1	Cy3	1
1	Cy5	2
2	Cy5	3
3	Cy3	3
3	Cy5	4
4	Cy3	4
4	Cy5	5
5	Cy3	5
5	Cy5	6
6	Cy3	6
6	Cy5	7
7	Cy3	7
7	Cy5	1

Array	Dye	Sample
1	Cy3	1
1	Cy5	2
2	Cy3	2
2	Cy5	3
3	Cy3	3
3	Cy5	4
4	Cy3	4
4	Cy5	5
5	Cy3	5
5	Cy5	6
6	Cy3	6
6	Cy5	7
7	Cy3	7
7	Cy5	1
8	Cy3	1
8	Cy5	2
9	Cy3	2
9	Cy5	3
10	Cy3	3
10	Cy5	4
11	Cy3	4
11	Cy5	5
12	Cy3	5
12	Cy5	6
13	Cy3	6
13	Cy5	7
14	Cy3	7
14	Cy5	1

NOTE: Save the design file as a tab delimited file in .txt format (i.e. DesignFile.txt).

2. PUT FILES IN SAME DIRECTORY

merge_imputeblank1.pl

18kA_format_demo.txt

3. SORT .GPR FILES

gpr data files

01_MS_0921_SB02.gpr (array #1, compares sample 1 to 2)
02_MS_0990_SB02.gpr (array #2, compares sample 2 to 3)
03_MS_0920_SB02.gpr (array #3, compares sample 3 to 4)
04_MS_0923_SB02.gpr (array #4, compares sample 4 to 5)
05_MS_0922_SB02.gpr (array #5, compares sample 5 to 6)
06_MS_0925_SB02.gpr (array #6, compares sample 6 to 7)
07_MS_0924_SB02.gpr (array #7, compares sample 7 to 1)
08_MS_0913_SB02.gpr (repeat of array #1, compares sample 1 to 2)
etc.

4. PRE-NORMALIZATION DATA PROCESSING

merge_imputeblank1.pl

.gpr data files

Soybean_18kA_Demo

merge_imputeblank1.pl

This is the file that you named at the start of merge_imputeblank1.pl program and contains all the data ready for R/maanova normalization.

Contains all spot flag information. This file will be needed for further imputation after R normalization to remove bad spots from data.

Contains listing of all bad flag spots.

Contains raw data in format similar to that of Rinput.txt file.

Same as the "rawdata_beforeBackgroundcorrection.txt" file except that the "Blanks" and "Empties" are removed.

RUNNING R/maanova TO NORMALIZE (SMOOTH) THE DATA

Note: the following descriptions and demo have been developed based on R version R 2.1.1.

Click here for the R/maanova functional codes. Once you are familiar with R/maanova this set of codes (called ".Rhistory") is all you'll need to run the normalization. DEAD LINK on real website, cannot find file .Rhistory

1. PUT FILES INTO SINGLE FOLDER/DIRECTORY

DesignFile.txt

2. RUN MAANOVA PACKAGE IN R

>library(maanova)