What are the factors affecting Graduate Admissions in America for Students?¶

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("./input"))

# Any results you write to the current directory are saved as output.
import seaborn as sns
import matplotlib.pyplot as plt
# reading dataset
df=pd.read_csv('./input/Admission_Predict_Ver1.1.csv')

#importing plotly
import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import cufflinks as cf
cf.go_offline()

['Admission_Predict_Ver1.1.csv', 'binary.csv']

Data Statistics and sneek-peek into the data:¶

#General data statistics
display(df.head())
df.info()
display(df.describe())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
Serial No.           500 non-null int64
GRE Score            500 non-null int64
TOEFL Score          500 non-null int64
University Rating    500 non-null int64
SOP                  500 non-null float64
LOR                  500 non-null float64
CGPA                 500 non-null float64
Research             500 non-null int64
Chance of Admit      500 non-null float64
dtypes: float64(4), int64(5)
memory usage: 35.2 KB

Key Highlights from dataset:¶

Average GRE Score: 316.47
Average TOEFL Score: 107.19
Average CGPA: 8.58
With Research: 56% of applicants

Checking if plotly and cufflinks are working correctly

#data = [go.Histogram(x=df["GRE Score"])]
# checking if plotly and cufflinks are working correctly
df['GRE Score'].iplot(kind="hist", bins=40,title="GRE Score Distribution")

layout1 = cf.Layout(
    height=600,
    width=800,
    margin=dict(
        l=200
    )
)
df.corr().iplot(kind='heatmap',colorscale='spectral', title = 'Correlation between different maps', 
    layout=layout1)

Following are highest correlated items with Chance of admit:¶

CGPA
GRE Score
TOEFL Score

df['Admit Chance']=pd.cut(np.array(df['Chance of Admit ']),3, labels=["bad", "medium", "good"])
#new_labels[:5]

How good is your Acceptance chance to a University based on your Scores:¶

NOTE: You can even turn the markers on and off by clicking on the legends in the below charts

scores_attr=['CGPA', 'GRE Score', 'TOEFL Score']
for i in scores_attr:
    df.iplot(x=i,y='University Rating',categories='Admit Chance',colors=['green','blue','red'],
            xTitle=i,yTitle='University Rating',title=f'Chances of Acceptance based on your {i}')

How does your SOP and LOR affect your chances of getting accepted?¶

color_dict={'good':'seagreen','medium':'skyblue','bad':'indianred'}
df_grouped=df.groupby(['SOP','LOR ','Admit Chance']).size().reset_index(name='counts')
#df_grouped.head()
df_grouped.iplot(kind='bubble',x='SOP',y='LOR ',xTitle='SOP',yTitle='LOR',title='Distribution of SOP and LOR with acceptance chances',
                 size='counts',text='Admit Chance',colors=df_grouped['Admit Chance'].map(color_dict).tolist())

This one is quite natural, students with good SOPs and good LORs have better acceptance chances. Although there are some exceptions.¶

Ideally students who are good at academics should have good GRE and TOEFL Score. Lets check this hypothesis below¶

Zoom in, rotate, check the values in the 3d plot below

studious_students=df[df['CGPA'] > 8]
studious_students.iplot(kind='scatter3d', x='GRE Score', y='TOEFL Score',z='CGPA',mode='markers', xTitle='GRE Score',yTitle='TOEFL Score',zTitle='CGPA',
                        title='GRE vs TOEFL vs CGPA')

Our hypothesis seems to be true.

Now, just to check the relationship between the SOP and LOR of students with Research. General guess would be students with research should have a good LOR and SOP.¶

df_research_grouped=df.groupby(['SOP','LOR ','Research']).size().reset_index(name='counts')

import plotly.tools as tls

fig = tls.make_subplots(rows=1, cols=2, shared_yaxes=True)
                       
df_non_research=df_research_grouped[df_research_grouped['Research']==0]
df_research=df_research_grouped[df_research_grouped['Research']==1]
fig.append_trace({'x': df_non_research.SOP, 'y': df_non_research['LOR '],'text':df_non_research['counts'],'type': 'scatter', 'name': 'Non Research','mode':'markers'}, 1, 1)
fig.append_trace({'x': df_research.SOP, 'y': df_research['LOR '], 'type': 'scatter','text':df_research['counts'], 'name': 'Research','mode':'markers'}, 1, 2)
fig['layout']['xaxis1'].update(title='SOP')
fig['layout']['xaxis2'].update(title='SOP')
fig['layout']['yaxis1'].update(title='LOR')
fig['layout'].update(hovermode= 'closest')
fig['layout'].update(title='SOP vs LOR for applicants with Research & Non Research experience')

cf.iplot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]

If you hover around and check the counts at the top right corners for both the plots. It is evident that research does help making your SOP and LOR better.¶

To Summarize:¶

CPGA plays the most important role in admissions followed by GRE score and TOEFL.
Good SOPs and LORs are essential to get into the best universities
Research makes your SOP and LOR better.
Studious students generally tend to do good at GRE and TOEFL.

Disclamer: This is a very small dataset and the comments above are in accordance to the data given. To perform more analysis, it is essential that we have more data points and features.

	Serial No.	GRE Score	TOEFL Score	University Rating	SOP	LOR	CGPA	Research	Chance of Admit
0	1	337	118	4	4.5	4.5	9.65	1	0.92
1	2	324	107	4	4.0	4.5	8.87	1	0.76
2	3	316	104	3	3.0	3.5	8.00	1	0.72
3	4	322	110	3	3.5	2.5	8.67	1	0.80
4	5	314	103	2	2.0	3.0	8.21	0	0.65

	Serial No.	GRE Score	TOEFL Score	University Rating	SOP	LOR	CGPA	Research	Chance of Admit
count	500.000000	500.000000	500.000000	500.000000	500.000000	500.00000	500.000000	500.000000	500.00000
mean	250.500000	316.472000	107.192000	3.114000	3.374000	3.48400	8.576440	0.560000	0.72174
std	144.481833	11.295148	6.081868	1.143512	0.991004	0.92545	0.604813	0.496884	0.14114
min	1.000000	290.000000	92.000000	1.000000	1.000000	1.00000	6.800000	0.000000	0.34000
25%	125.750000	308.000000	103.000000	2.000000	2.500000	3.00000	8.127500	0.000000	0.63000
50%	250.500000	317.000000	107.000000	3.000000	3.500000	3.50000	8.560000	1.000000	0.72000
75%	375.250000	325.000000	112.000000	4.000000	4.000000	4.00000	9.040000	1.000000	0.82000
max	500.000000	340.000000	120.000000	5.000000	5.000000	5.00000	9.920000	1.000000	0.97000