What are the data files required for the analysis?
- mRNA sequence data
- Network (pathways) file exported from Cytoscape
- Protein expression data
It would be best if you remembered before executing the script given by this tutorial that the naming convention for genes or protein in each data file is different.
For instance, the pathway obtained from the Cytoscape database, which depends on what database you refer to, sometimes has its particular naming convention to simplify such a complicated protein-protein interaction or signaling cascade.
We recommend the best practice to create a lookup table from each data based on the name of genes, proteins, or molecules as a reference library.
It is also possible that each file may use different names for a specific target because of the synonym. Therefore, we recommend visiting PDB website to make sure you are not missing possible matches among genes, proteins, or molecules from these data files.
Structure of each Data File
1. mRNA Sequence Data
- Input file name: mRNAdata.csv
How this data set was collected:
- The mRNA sequence data file contains the genes uniquely upregulated in M1 macrophage compared to M0 phenotype. This process was based on the literature published in 2019.
Description per Column:
- MGI symbol: We refer this as gene name
- mRNA seq Expression Level (rawdata) in M1 macrophage (the unit of RPM)
- mRNA seq Expression Level (rawdata) in M2 macrophage (the unit of RPM)
- mRNA seq Expression Level (rawdata) in M2 macrophage treated with M1 MEV (denoted as M1M2) (the unit of RPM)
2. Network (Pathway) Exported from Cytoscape
How this data set was collected:
- Cytoscape provides an excellent toolkit for editing pathways, including merging networks, defining the characteristics of edges, adding or deleting nodes (edge denotes “protein-protein or signaling interaction” and node denotes “molecule or protein complex”).
The demonstrated figure of the network, which is the outcome of merging multiple pathways for M1 and M2 polarization, can be exported as a sif file containing the information of protein-protein molecule-molecule interaction/signaling cascades in the text as shown in the below figure.
- Input file name: expanded_IL6_modified_receptor.sif
How to interpret this text file:
- The fundamental structure of this data file is A -> B, which indicates directed network.
- Directed network means that unless the text file indicates, the interaction is valid in one way as shown.
Description per Column:
- Node names for the starting of the interaction between two proteins or molecules or complexes
- Edge characteristics
- Node names that is linked to the starting node
3. List of Receptors
- Input file name: receptorlist.txt
- If one of nodes added as a new network to the library is receptor, please list the receptor name in Network (sif file) to here.
- pro/anti indicates whether it is pro/anti-inflammatory receptor
4. Protein Expression Data
How this data set was collected:
- Input file name: Ab_chris.csv
- The data was experimentally collected to characterizae M1 and M2 macrophages.
- Value in the 3rd column indicates the ratiomatric scale change in expression from M1 to M2. Therefore, below and above 1 indicate down-regulation and up-regulation respectively.
- Because of name convension, there is an additional column (2nd) to list the node names that are equivalent to the protein names.
Description per Column:
- Protein names reported via experiment
- Equivalent node names in Cytoscape-generated pathway
- Ratiomatric value from M1 to M2
5. List of Ligands
- Input file name: ligandlist.yaml