Guidelines for Data Analysis


According to a study published in the US
News and World Report in 2010, the cost of medical malpractice in the United
States is $55.6 billion a year, which is 2.4 per cent of annual health-care
spending *. Another 2011 study published in the New England
Journal of Medicine revealed that annually, during the period 1991 to 2005, 7.4
per cent of all physicians licensed in the US had a malpractice claim against
them. These staggering numbers not only contribute to the high cost of health
care in the US, but the size of successful malpractice claims also contributes
to high premiums for medical malpractice insurance.

A report from
McKinsey (May 2014) Unleashing the Value of Advanced Analytics in
Insurance states:

“The proliferation of
third-party data sources is reducing insurers’ dependence on internal data.
Digital “data exhaust” from social media and multimedia, smartphones,
computers, and other consumer and industrial devices — used within privacy
guidelines and assuring anonymity — has become a rich source for behavioural
insights for insurance companies, as it has for virtually all businesses.

the release of previously unavailable or inaccessible public sector data has
greatly expanded potential sources of third-party data. The US and UK
governments and the European Union have recently launched “open data”
Web sites to make available massive amounts of government statistics, including
health, education, worker safety, and energy data, among others. With much
better access to third-party data from a wide variety of sources, insurers can
pose new questions and better understand many different types of risks.”

The UnitedHealth Group: America’s most
prominent health insurance provider has collated a range of data and wants to
develop a better understanding of its claims paid out for medical malpractice
lawsuits. Its records show claim payment amounts, as well as information about
the presiding physician and the claimant for many mediated or settled lawsuits
this year.

You are a Data Analyst working for
UnitedHealth Group. Your Manager – Edmond Kendrick has asked you to conduct a
preliminary analysis of collected data. In particular, you are expected to
perform a series of descriptive and inferential analyses and produce a report
based on your findings.

The data set contains numerous variables and details about the claims.
The eight variables in the data table are described below:

Claimant ID Unique ID of the claimant
Amount Amount of the
claim payment in dollars
Severity The severity
rating of damage to the patient (MILD, MEDIUM, SEVERE)
Age Age of the
claimant in years
Whether the
claimant was represented by a private attorney
Marital Status Marital status
of the claimant
Specialty Specialty of
the physician involved in the lawsuit
Insurance Type of medical
insurance carried by the patient
Gender Patient Gender

Edmond’s email to
you is reproduced on the next page.


Email from Edmond Kendrick

To:   <<Your

From:     Edmond Kendrick

Subject: Analysis of Claims(updated)


As discussed earlier, I have cleaned and
simplified the dataset to eight variables for your convenience. The cleaned
dataset contains information about 200 randomly selected claims made this year.

  1. I would like to compare this year’s claims data against several other
  1. Is there a difference in proportion of
    “MILD” or “MEDIUM” claims by a patient, ‘s Gender? Can we
    conclude that there is a difference in the proportion of “MILD” or
    “MEDIUM” type severity claims by female patients compared to that of
    male patients?

    1. As an industry standard, it is believed
      that the payment amounts are related to whether or not a private attorney
      represented the claimant. In particular, the average claim amount when a
      private attorney is involved is higher than when there is no private attorney
      involved. Does the data support this proposition?
    1. Also, the industry stakeholders believe
      that private attorney representation is higher for ‘SEVERE’ claims than for claims
      with a “MEDIUM” severity. Is this a valid statement?
  • The Insurance company is particularly
    concerned in ‘SEVERE’ claims as the amount of claims are significantly higher
    compared to other claims. Therefore, I would like to get an understanding of
    the relationship between the speciality of the physician involved; the severity
    of the claim, and the average claim amounts.

    • I believe that the percentage of
      “SEVERE” claims with the involvement of an Orthopaedic surgeon is
      lower than that of all other specialists.
    • I also believe that the average claim
      amount for “SEVERE” claims is higher when an Orthopaedic surgeon is
      involved than all other specialisations.

Is there any evidence to support my assertions

  • I would like to expand the analysis further and look at whether:
  1. The average claim amount significantly differs across the claimant’s
    marital status.
  • The average claim amount significantly differs across different
    surgeon specialities.
  • The proportion of claimants represented
    by a private attorney differs significantly across the claimant’s marital
  • I would like you
    to design and run an experiment to see the effect of private attorney
    representation and insurance on the amount claimed using the data set in attached Excel
    File – use Data in the “Experiment” worksheet.
  • The ability to submit work on time is a
    highly sought after skill at UnitedHealth. As a part of your professional
    development, I would like you to report back how you plan and organise your
    work at UnitedHealth as a data analyst.

I look forward to
your response


Edmond Kendrick

Chief Data Scientist – UnitedHealth Group


The assignment
consists of two parts: Analysis and Report. You are
required to submit both your written report and your analysis.

Guidelines for Data

Read the case
study and questions asked by Edmond carefully. Then spend some time reviewing
the data to get a sense of the context. The analysis required for this
assignment involves material covered in Module 1, with the corresponding
tutorials being a useful guide.

The analysis
should be submitted in the appropriate worksheets in the Excel file. Each
question from the email should be analysed in a separate tab (e.g. Q1, Q2 … or
Q3.1, Q3.2 …). You need to add these. Before submitting your analysis, make
sure it is logically organised, and any incorrect or unnecessary output has
been removed. Marks will be penalised for poor presentation or
disorganised/incorrect results or any unnecessary output.

For all questions
in the email, you can assume that:

  • 95 % confidence
    is appropriate
    for confidence intervals and;
  • 5.0 % level of
    (i.e. α = 0.05)
    is appropriate for any hypothesis tests.

You can complete all data analysis using the Excel templates provided
in the assignment data file. In choosing the technique to apply for a given
question, keep the following in mind:

  • Are we dealing
    with a numerical variable or categorical variable?
  • Are we dealing
    with one sample, two samples or more than a two-sample situation?
  • Are we dealing
    with independent samples or a paired-sample situation?
  • Each question
    must be answered using the most appropriate technique and justify your decision
    where applicable.
  • For relevant
    questions, please formulate your hypotheses, and state them clearly in both
    notation and words.
  • Even though a
    question(s) may lead you to inferential techniques, consider conducting a
    descriptive analysis of the sample data first.



  • If you have
    established that there is a difference between two means or proportions, we
    expect you to estimate and report the difference.
  • If you have
    established that there is a difference between two or more means or
    proportions, we expect you to follow up with an appropriate multiple comparison

You may need to
make certain assumptions about the data set we are using to answer some
questions. For other questions, there will be technical/statistical assumptions
that you need to make; for example, whether to use an equal or an unequal
variance test…etc. You need to consider and incorporate any violations
of assumptions such as unequal sample sizes as limitations of your analysis in
your report.

Guidelines for
the Technical Report

Once you have
completed your data analysis, you need to summarise the key findings for each
question and write a response to Edmond in a report format. Your technical
report consists of four sections: Introduction, Main Body,
, and Appendices. The report should be around 1,500 words.

Use proper
headings (e.g. Q1, Q2 … or Q3.1, Q3.2…) and titles in the main body of the
report. Use sub-headings where necessary.


  • Include relevant
    excel outputs including templates, tables, charts, and graphs in Appendices
    (appendices are not included in the word count).
  • Make sure these
    outputs in the Appendix are visually appealing, have a consistent
    formatting style
    and proper titles (title, axes titles,
    etc.), and are numbered correctly. Where necessary, refer to
    these outputs in the main body of the report. If an output, graph or chart is
    not referred to in the body of the report then do not include it in your
  • The introduction begins by highlighting the main
    purpose(s) of the analysis and concludes byexplaining the structure of
    the report (i.e., subsequent sections). The conclusion should highlight
    the key findings of the analyses and explain the main limitations (if

The post Guidelines for Data Analysis appeared first on My private mentor.


Source link

"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"

Hi there! Click one of our representatives below and we will get back to you as soon as possible.

Chat with us on WhatsApp