LMI For All

Documentation & Development

User Tools

Site Tools


Sidebar

Start Pages

Team Pages

Upcoming Events

Apr 24 Modding Day
data:specsheet_ashepay

Currently being edited

ASHE/LFS Pay (Mean Weekly Pay)

  • Submitted by: Luke Bosworth, l.p.bosworth@warwick.ac.uk
  • Submitted on: 22/04/2016
  • Revision; 3

Data File(s)

This data load contains 5 files, which are as follows:

File Content Size (bytes)
ashe_pay_main.csv Average weekly pay including overtime 164,739,426
ashe_age_coeffs.csv Parameters to generate estimates by age “on the fly” 26,596
ashe_age_values.csv Parameters to generate estimates by age “on the fly” 1,132,352
ashe_median_deciles.csv Median and deciles estimates based on normal distribution of log (Pay), based on LFS, for FT and PT separately 164,739,426
wf_occupations.csv Occupational id, codes and descriptions 15,304

Source Dataset

The Pay estimates are based on a combination of data from ASHE (Annual Survey of Hours and Earnings) and the Labour Force Survey (LFS). The most recent ASHE available to access in the Secure Data Lab was ASHE 2017 at the time of this updating work, thus ASHE data from 2016 and 2017 and LFS data from 2017 and 2018 were used for the generation of 2018 pay estimates. The assumption is that the estimated pay based on ASHE 2016 and 2017 is not hugely different from the estimated pay from ASHE 2017 and 2018. Thanks are due to the Secure Data Service at the UK Data Archive for providing access to the Annual Survey of Hours and Earnings (ASHE) data to enable the econometric analysis on which these numbers are based.

Distinctions between Pay, Wage and Earnings:

Pay means the money paid to someone for their regular work. Wages is a fixed regular payment earned for providing work or services to a company. Earnings can mean money received from wages or money received from non-earned income sources. The term “earnings” is interchangeable with wages when it means money obtained for hourly or salary paid work. Gross weekly Pay is used in all the Pay estimations based on LFS and ASHE in this study. Here the term Pay is generally used, although following standard conventions the term “earnings equation” is used to refer to the econometric equation estimated to predict Pay. The Pay variable used in the LFS is “GRSSWK”. It is the gross weekly Pay before deductions in an individual’s main job. It applies to employees and those on a government scheme but not those employed on New Deal, in the voluntary sector, or the environmental task force. The Pay variable used in ASHE is “GPAY”. It is the average gross weekly earnings in the reference period from either the main job or another job. Its main components are basic gross weekly earnings and allowances. The other components include overtime payments, incentive/bonus payments that relates to this Pay period, and additional premium payments during the Pay period for shift work and night or weekend work not treated as overtime.

General description of the Pay data

The use of “raw” data from the LFS or ASHE in the LMI for All data portal is limited due to sample size and concerns about confidentiality. Reliance on the “raw” data would result in huge gaps in the information available to be presented in the portal. To get around these limitations the portal uses “predicted Pay” estimates, based on an econometric analysis of the ASHE and LFS data sets. The econometric analysis uses a standard “Mincerian” Earnings Equation (the “main” earnings equation as described below). In order to provide additional details by age, as well as features of the distribution of Pay such as deciles, supplementary equations are also used. Full details of the approach can be found in Li and Wilson (2016). Compared to the LFS, ASHE has the advantages in that it has more reliable Pay information which is provided by employers rather than individuals and it has a larger sample size than LFS. However, information on individual characteristics is more limited in ASHE and it does not have any information on education or qualification. In order to get around these problems, the LMI for All database is based on a set of estimates/predictions of Pay using data from both ASHE and LFS. The predicted Pay estimates in the 'PayFTPT.lasc' file are generated using the main earnings equation. These initial predictions are adjusted using an iterative RAS procedure to match the published Pay figures from ASHE and the LFS across each of the main dimensions/characteristics (gender, region, industry, occupation and qualification). Again details can be found in Li and Wilson (2016). In order to generate predictions of Pay by age in the database supplementary age equations are estimated for full-time and part-time workers separately based on the assumption that the impact of various factors affect full-time and part-time workers in a different way.These results are then use to predict Pay by age based on the mean value of Pay for all ages. Similarly, predicted median and decile Pay levels are based on parametric methods and the assumption that Pay is log-normally distributed. There is no Pay data available for the occupation 'Armed Forces'. There is, therefore, no employment data for 'Armed Forces' in the 'ashe_pay_main.csv' file. However 'Armed Forces' employment is included in the main Working Futures employment data (WFDataOcc4Digit.csv).

The pay file provided (ashe_pay_main.csv) includes columns Employment and PayBill. The former is the employment data that should be used as weights. This is intended to be “self-contained”. There is no need to use the separate Working Futures data upon which it is based. The employment weights used are a subset of the Working Futures employment time series data set, but contain data for just one year. They differ from the Working Futures data in various other minor respects (for example they omit Armed Forces).

These data should be used as employment weights for anything to do with Pay or Hours in the ASHE section of the LMI for All database.

The reference to Working Futures in the documentation is intended simply to explain how the weights have been derived.

The Main Earnings Equation

The predictions of mean Pay are based on an econometric analysis of data from both ASHE and the LFS. In both cases a standard earnings equation based on the approach of Mincer is used (see Li and Wilson (2016) for details). This is referred to as the “Main” earnings equation (to distinguish it from various supplementary equations used to generate estimates by age and to produce estimates of deciles). ASHE does not include information on qualification, so a second equation is estimated using LFS data to estimate the marginal impact of qualifications on Pay. A linear earnings function with a quadratic term for age indicating the impact of changes of age on Pay is estimated using the ordinary least square method. The natural log of gross weekly Pay is the dependent variable. Separate equations are estimated for full-time and part-time workers. The equations are also estimated separately using ASHE and LFS data. The independent variables included, and their definitions, are as follows:

  • Age: a continuous variable ranging from 16 to 84;
  • Age squared: continuous variable;
  • Gender: male and female, 1 dummy variable for male (base category: female)
  • Region: 12 government official regions of England or devolved countries within the UK, 11 dummy variables in the regression (base category: London)
  • Highest qualification (LFS equations only): variable distinguishes highest qualification held. There are 9 categories (QCF1-8 and no qualification) and 8 dummy variables in the regression (base category: QCF8)
  • Industry: standard 75 categories as used in Working Futures, 74 dummy variables in the regression (base category: Agriculture, etc.);
  • Occupation: 4-digit SOC2010, 369 categories and 368 dummy variables in the regression (base category: 115 Chief executives and senior officials).

Interactive terms are included to detect heterogeneity across different groups:

  • Gender by occupation: gender is interacted with 4-digit occupation categories to control Pay differences between male and female within each occupation. The base group is female Chief executives and senior officials.
  • Industry by time trend: a time trend variable is created for 2017 and 2018 in the LFS analysis and for 2016 and 2017 in the ASHE analysis. It is interacted with industries to control time trend differences within each industry. The base groups are industries in 2017 in the LFS estimation and industries in 2016 in the ASHE estimation.
  • Occupation by time trend: the time trend is also interacted with occupations to control time trend differences within each occupation. The base groups are occupations in 2017 in the LFS analysis and occupations in 2016 in the ASHE analysis.

Using ASHE data separate equations are first estimated for full-time (FT) and part-time (PT) workers. A second set of equations are estimated using LFS data. These include a variable indicating the highest qualification held by the individual. The estimated coefficients of the independent variables and the constant term are then used to derive the expected Pay for an individual with certain characteristics (as defined by the variables included). The default reference group in the LFS is female workers living in London with highest qualification QCF8 working in the Agriculture sector and are Chief executives or senior officials in 2017 and the default reference group in the ASHE is the same group of people in 2016. The log expected Pay for an individual with these default characteristics at certain age can be calculated by adding the following parts together: coefficient on age times age; coefficient on age square times age squared; plus the coefficient for the constant term. The calculation of log expected Pay for people with other characteristics is simply made by adding coefficients for relevant dummy variables and interaction terms to this default log expected Pay. For example, for a male worker with all the other same characteristics as default, his log expected Pay is the default log expected Pay plus the estimated coefficient of the male dummy. To obtain the expected Pay, the log numbers need to be converted back to Pay by exponentiating

Pay=EXP (log expected Pay).

These predictions of pay by the various characteristics identified in the list of independent variables above form the basis for the initial estimates included in the file 'ashe_pay_main.csv’ .The predictions are made for those of average age in the category concerned. The estimates of the average age for each category or combination are based on data from the LFS. The corresponding information based on ASHE is not available due to the disclosure risks of small sample size in each combination. An assumption of similar age distribution in LFS and ASHE is therefore made.

Constraints to match published data

In order to ensure the final Pay estimates are consistent with the published data the detailed data base in 'ashe_pay_main.csv’ is adjusted using RAS iterative techniques to match published official data. The Pay estimates multiplied by the corresponding employment numbers when summed match the corresponding ASHE totals at the 4 digit level occupation. For further details see Li and Wilson (2016).

Estimates of Pay by Age

In order to generate predictions of Pay by age and provide an indication of how Pay of each age category varies from mean Pay, “supplementary age equations” have been estimated. These enable typical ratios between Pay of a particular age category and mean Pay of all ages for a particular combination (defined by four dimensions including occupation, gender, full-time or part-time working and the highest level of qualification) to be calculated. To ensure a reasonably large sample size of each combination for feasible Pay estimation, occupation has been set at the 1-digit level (covering 9 categories). Patterns at a more detailed level are assumed to be the same. To provide information for all possible aggregates, an extra category for all occupations has also been included. The 10 occupational categories are the 1 digit level of SOC2010 as follows: 1. Managers and senior officials 2. Professional occupations 3. Associate professional and technical occupations 4. Administrative and secretarial occupations 5. Skilled trades occupations 6. Personal service occupations 7. Sales and customer service occupations 8. Process, plant and machine operatives 9. Elementary occupation 10. All occupations

10. For queries relating to more detailed levels of SOC2010 the corresponding 1 digit level values should be used. For the same reason, the highest level of qualification held has been classified into three broad groups, plus an aggregated group for all qualifications. These relate to the more detailed ;levels of QCF as follows: 1. High (QCF4-8) 2. Medium (QCF1-3) 3. Low (No Qualification) 4. All qualifications

Again for queries relating to more detailed levels of QCF the corresponding broad category values should be used.

Gender and full-time or part-time workers both have two categories and an aggregated total. For gender:

  • 0. Female
  • 1. Male
  • 2. Total

Similarly for full-time/part-time:

  • 0. Part-time
  • 1. Full-time
  • 2. Total

Together, all the four dimensions give 360 possible combinations including totals and sub-totals. Industry is not included in the analysis because the sample size tends to get very small once industry is considered. We assume patterns by age do not vary significantly once the 4 main dimensions above (gender, full-time or part-time working, occupation and qualification) have been taken into account. The main objective of the supplementary age equations is to provide a descriptive summary of how Pay is likely to vary with age. The factors used in the supplementary equations include age and age squared reflecting the most common finding in the literature that the relationship between age and Pay is generally an inverted U-shaped with Pay peaking in middle age and declining smoothly thereafter. The functional form of the supplementary age equation used is:

Pay(Age)=a+b*Age+c*Age²+u					(1)

where:

  • Pay (age) is the average gross weekly Pay of individuals of the same age in a particular combination calculated from the LFS data.
  • a is the constant term, b and c show how the gross weekly Pay varies by age, and u is the error term.

The regression of equation (1) is performed for each of the 360 combinations to derive a, b and c. The expected Pay at each age from 20 to 65 is calculated by applying the estimated coefficients (a, b and c) to the value of age and age squared. (Note: some ages are missing in some combinations. The estimated coefficients are applied to those ages to derive their expected Pay). To provide an indication of how the expected Pay at each age between 20 and 65 is distributed around the mean Pay of all ages in each combination, the mean Pay of all ages in each of the 360 combinations is then calculated from the LFS data. A ratio (R) between the predicted Pay of a particular age and mean Pay of all ages in a combination is derived to indicate the distance of Pay of a particular age from mean Pay. The ratios calculated for each age category then enable a prediction of Pay by age around the mean Pay to be made for LMI for All “on the fly”:

PredictedPay(age)=R*meanpay
  • PredictedPay is the predicted Pay for an individual of an age between 20 to 65 in the category of interest (a particular combination of characteristics defined by occupation at 1-digit level, gender, full-time or part-time worker, and three-category qualification).
  • R is the ratio for: (predicted Pay of an age)/(mean Pay of all ages) in the same category.
  • meanpay is the average (or mean) level of Pay for all age in the category of interest from the file 'ashe_pay_main.csv'. (Note that the mean Pay provided in the file 'ashe_pay_main.csv' are based on a combination of LFS and ASHE data, so they are slightly different from what weree used to estimate equation (1) which only use data from the LFS).

16560 ratios are calculated for 360 combinations between age 20 and 65. For a query about Pay estimates, the four dimensions (occupation, gender, full-time or part-time and qualification) are used to identify the corresponding combination. Which ratio to use within this combination depends on which age the query is intended for. If the query does not involve these dimensions, for example a query about a particular industry or region, the ratio for “ALL” (or some other relevant sub-total) is used.

Estimates of Medians and Deciles

In order to generate predictions of Pay medians and deciles, supplementary “distribution equations” are also used, based on analysis of both LFS and ASHE data. In this case the analysis is based on an assumption of Pay being log-normally distributed. The file 'ashe_median_deciles.csv' contains mean/median ratios and (standard deviation)/mean ratios for full-time and part-time employees separately, based on analysis of data from ASHE. The estimates of the Median and deciles assume a log-normal distribution, using the mean and standard deviation estimated from the source data (LFS or ASHE). The formula used to compute median and other deciles based on the (log) normal distribution is referred to as the “distribution equation” and is set out in equation (2) and (2a):

Median Pay = Mean Pay * (1/r)    (2)
Median or deciles (log Pay) = Median (log Pay) +z*σ      (2a)

r = ratio of Mean to Median Pay (distinguishing occupation, gender and full–time and part-time status) calculated based on ASHE. Ratios of Mean to Median pay are generated at 1-digit, 2-digit, 3-digit and 4-digit occupation levels.

For any detailed enquiry the corresponding Median Pay estimate is generated by assuming the same r value applies across all other dimensions of the database.

Median (log Pay) equals Mean (log Pay) when log Pay is normally distributed, σ is the corresponding standard deviation of log Pay. The value z measures how far the desired decile or median is located away from the mean in a normal distribution, or in other words, how many standard deviations it is away from the mean.

Note that the mean, median or deciles mentioned in the remainder of this section all refer to the log of Pay. In order to estimate median and deciles based on z scores and standard deviations, an assumption of a normal distribution is required. The raw Pay data is not normally distributed, thus this approach cannot be used directly to generate such estimates. However taking the log of the Pay data transforms the distribution to a log normal one, which can be subsequently used to estimate median and deciles. The z score is known for any specified deciles or median and can be obtained from the standard normal cumulative probability table. Given mean Pay, the standard deviation and z score, equation (2) can be used to predict median Pay and any of the other deciles.

The z scores for the median and different deciles are as follows:

Deciles 10 20 30 40 50 60 70 80 90
z scores -1.28 -0.84 -0.52 -0.25 0 0.25 0.52 0.84 1.28

(source )

The 'ashe_median_deciles.csv' file also contains estimates of typical values of the ratios of the standard deviation of log Pay to the mean value for each of the main categories of interest (4 digit occupations and status (FT/PT). The focus is on how median Pay (and other deciles) vary around mean Pay. This assumes that the Pay distributions are otherwise the same across other dimensions such as gender, industry, region, qualification, etc. The formula used to compute median and other deciles based on the (log) normal distribution is as shown in Equation (2) above. Ideally estimates of σ are needed for all the main dimensions, but limitations of sample size in both ASHE and the LFS imply this is impossible for all possible permutations and combinations. Typical values are therefore assumed, based on variations across the main dimensions of interest (but not all possible cross dimensions). The focus is on variations by status (FT/PT) and occupation since inspection of the data suggests this is where the variations are greatest. Values of σ have therefore been estimated across these main dimensions and similar patterns are assumed to apply across all other dimensions for the purpose of this calculation. The assumption adopted is that the ratio of σ to the mean value is fixed across all the other dimensions. Using these ratios (which are differentiated by 4 digit occupation and full-time or part-time status) values of σ are generated within the API for all possible permutations and combinations. These estimated standard deviations are then used “on the fly” to create the prediction of median Pay and other deciles from the mean values extracted from 'ashe_pay_main.csv'. The ratios of σ to the mean value are provided in the 'ashe_median_deciles.csv' file. Ratios for totals and sub-totals of occupations at 1-digit, 2-digit, 3-digit and 4-digit occupation levels and full-time or part-time status are also calculated. If the query does not involve occupation or full-time/part-time dimensions, the ratio for “ALL” ” (or some other relevant sub-total) should be used.

References

Li, Y. and R.A.Wilson (2016) Developing a Careers LMI Database: Technical Report on Generating Detailed Estimates of Pay and Hours. Institute for Employment Research , University of Warwick, Coventry.

Fields and Columns

Mean Pay by SOC2010 4-digit category (ashe_pay_main.csv)

The 'ashe_pay_main.csv' file contains information on the mean pay for those in each of the combinations defined by the following dimensions:

  • year - (2018)
  • gender - 2 (male and female)
  • status - 2 (full-time and part-time)
  • industry - 75 standard industries (see classification and aggregation below)
  • occupation - 369 4-digit SOC 2010 categories
  • geography - 12 Countries and English regions
  • qualification - 9 National Qualification Framework levels

Two columns have the information needed to calculate mean pay:

  • Employment - the relevant total employment number that should be used for weighting
  • PayBill - The Total (weekly) Pay bill for the category concerned (Pay*Employment)

Mean pay is PayBill divided by Employment. Aggregating involves summing these two columns separately and and dividing the results to get mean pay.

ashe_age_coeffs.csv/ ashe_age_values.csv

In order to generate predictions of Pay by age and provide an indication of how Pay of each age category varies from mean Pay, “supplementary age equations” have been estimated. These enable typical ratios of Pay of a particular age category compared to mean Pay value of all ages for particular combinations to be calculated. This is done for four main dimensions including occupation, gender, full-time or part-time working and the highest level of qualification. Pay of a particular age category is predicted using the parameters from a regression. The file 'ashe_age_coeffs.csv' & 'ashe_age_values.csv' contains the ratios necessary to compute Pay by age for different combinations based on ASHE.

ashe_median_deciles.csv

In order to generate predictions of pay medians and deciles, “Supplementary” equations are again used, (see equation (2) above). The file contains parameters necessary to compute median and deciles, for full-time and part-time employees separately, based on LFS data. The gross weekly pay is (naturally) logged to be transformed into a normal distribution.

The 'ashe_median_deciles.csv' file contains:

  • z scores (as in the table above)
  • σ / mean ratios for FT, PT and occupations
  • Mean/median ratios for FT, PT, Males, Females and occupations

The Mean/median ratios, σ / mean ratios and z, together with the mean values, allow generation of estimates of medians and deciles across the four main dimensions including occupations, gender, full-time/part-time and qualifications. Note the need to make the log transformation. The data in the ashe_pay_main.csv file are NOT logged.

Output

The 'ashe_pay_main.csv' file contains predictions/estimates for mean Pay for each of the 369 SOC2010 Unit Groups, broken down by the other main dimensions (including industry, region, gender and qualification). These predictions or estimates are as described in the subsection above on General description of the Pay data. The estimates are based on a combination of LFS and ASHE data. They include separate estimates for both full-time (FT) and part-time (PT) employees.

Queries and calculations

Pay Data Specification (ashe_pay_main.csv)

The first column is the year. The second to seventh cover gender, status( FT/PT), industry, occupation, geography and highest qualification held. These show the characteristics of people covered by the dataset. The penultimate column provides the corresponding employment number. The final column, 'PayBill', represents the mean pay per week multiplied by Employment for the category with the characteristics in columns two to seven.

FIELDS AND COLUMNS

  1. year (2015)
  2. gender (male, female)
  3. status (FT, PT)
  4. industry (75 Working Futures industries)
  5. occupation (369 4 digit occupations)
  6. geography (countries or English region)
  7. qualification (9 QCF levels, highest qualification held)
  8. Employment
  9. PayBill

Ratios for estimating Pay deciles (Median and deciles.xlsx)

FIELDS AND COLUMNS

=Worksheet ‘info’= The worksheet 'info' within the 'Median and deciles.xlsx' workbook includes the: z score values (in column B) for deciles 10-90. The median value is the fifth decile (row 6). =Workbook ‘SDMean’=

The sheet 'SDMean' contains the σ /mean ratios, for occupations, for FT and PT (and totals Note that the z values are the same for all cases. The log and real numbers for the medians and deciles are generated “on the fly” using equations (2) and (2a). Full-time Workers Column B includes the σ /mean ratios for all 369 4 digit occupations (rows 4-372) for full-time workers. The ratio for all occupations is in row 373. Column G includes the σ /mean ratios for all 89 3 digit occupations (rows 4-92) for full-time workers. Column M includes the σ /mean ratios for all 25 2 digit occupations (rows 4-28) for full-time workers. Column S includes the σ /mean ratios for all 9 1 digit occupations (rows 4-12) for full-time workers. Part-time Workers Column C includes the σ /mean ratios for all 369 4 digit occupations (rows 4-372) for part-time workers. The ratio for all occupations is in row 373. Column H includes the σ /mean ratios for all 89 3 digit occupations (rows 4-92) for part-time workers. Column N includes the σ /mean ratios for all 25 2 digit occupations (rows 4-28) for part-time workers. Column T includes the σ /mean ratios for all 9 1 digit occupations (rows 4-12) for part-time workers. Full-time and Part-time Workers Column D includes the σ /mean ratios for all 369 4 digit occupations (rows 4-372) for full-time and part-time workers. The ratio for all occupations is in row 373. Column I includes the σ /mean ratios for all 89 3 digit occupations (rows 4-92) for full-time and part-time workers. Column O includes the σ /mean ratios for all 25 2 digit occupations (rows 4-28) for full-time and part-time workers. Column U includes the σ /mean ratios for all 9 1 digit occupations (rows 4-12) for full-time and part-time workers.

=Worksheet ‘MeanMedian’= Column A contains the 4 digit occupation code and total (rows 4-373). Columns B and C contain mean/median ratios for full-time males and females for all 369 4 digit occupations (rows 4-372). The ratio for all occupations is in row 373. Columns E and F contain mean/median ratios for part-time males and females for all 369 4 digit occupations (rows 4-372). The ratio for all occupations is in row 373. Columns H and I contain mean/median ratios for full-time and part-time workers (males and females combined together) for all 369 4 digit occupations (rows 4-372). The ratio for all occupations is in row 373. Columns K and L contain mean/median ratios for males and females workers (full-time and part-time combined together) for all 369 4 digit occupations (rows 4-372). The ratio for all occupations is in row 373. Columns N contains mean/median ratios for all 369 4 digit occupations (full-time and part-time, males and females all combined together) (rows 4-372). The ratio for all occupations is in row 373. Column P contains the 3 digit occupation code (rows 4-92). Columns Q and R contain mean/median ratios for full-time males and females for all 89 3 digit occupations (rows 4-92). Columns S and T contain mean/median ratios for part-time males and females for all 89 3 digit occupations (rows 4-92). Columns U and V contain mean/median ratios for full-time and part-time workers (males and females combined together) for all 89 3 digit occupations (rows 4-92). Columns W and X contain mean/median ratios for males and females workers (full-time and part-time combined together) for all 89 3 digit occupations (rows 4-92). Columns Y contains mean/median ratios for all 89 3 digit occupations (full-time and part-time, males and females all combined together) (rows 4-92).

Column AB contains the 2 digit occupation code (rows 4-28). Columns AC and AD contain mean/median ratios for full-time males and females for all 25 2 digit occupation code (rows 4-28). Columns AE and AF contain mean/median ratios for part-time males and females for all 25 2 digit occupation code (rows 4-28). Columns AG and AH contain mean/median ratios for full-time and part-time workers (males and females combined together) for all 25 2 digit occupation code (rows 4-28). Columns AI and AJ contain mean/median ratios for males and females workers (full-time and part-time combined together) for all 25 2 digit occupation code (rows 4-28). Columns AK contains mean/median ratios for all 25 2 digit occupations (full-time and part-time, males and females all combined together) (rows 4-28).

Column AO contains the 1 digit occupation code (rows 4-12). Columns AP and AQ contain mean/median ratios for full-time males and females for all 9 1 digit occupation code (rows 4-12). Columns AR and AS contain mean/median ratios for part-time males and females for all 9 1 digit occupation code (rows 4-12). Columns AT and AU contain mean/median ratios for full-time and part-time workers (males and females combined together) for all 9 1 digit occupation code (rows 4-12). Columns AV and AW contain mean/median ratios for males and females workers (full-time and part-time combined together) for all 9 1 digit occupation code (rows 4-12). Columns AX contains mean/median ratios for all 9 1 digit occupations (full-time and part-time, males and females all combined together) (rows 4-12).

Ratios for estimating Pay by age (age ratios.xlsx)

FIELDS AND COLUMNS

The worksheet ‘ratios by age’ contains ratios between Pay of a particular age category and mean Pay of all ages for a particular combination (defined by four dimensions including occupation, gender, full-time or part-time working and the highest level of qualification). Column A is the group ID (360 categories, including totals and sub-totals) Column B to column E are the four dimensions used to define the group (occupation (9 SOC2010 1 digit categories plus Total); Gender (M, F and Total); Status (PT, FT and Total) and Qualification (3 broad levels of QCF plus total – QCF4-8, QCF1-3 and No qualification).

Columns F to column AY provide the ratios within a particular group by age (between 20-65).

Generation of estimates of medians and deciles and age.

There are two possibilities:

  1. Generate an estimate by age on the fly using the ratios provided in the age ratios.xlsx file and then apply the distribution equation;
  2. Apply the distribution equation first and then the ratios from age ratios file.

There is no obviously correct approach. However, the age ratios are based on mean values, so it is not certain that it will generate sensible results if applied to median or decile values (Option 2). The distribution equation is designed to focus on patterns around the mean, so Option 1 is regarded as preferable.

In dealing with queries about medians and deciles when age is also of interest therefore it is recommended that we apply equation (1) first and then equation (2).

Examples - Main data base

Mean pay for particular categories

Example 1

Female, Full-time, 'Senior police officers', QCF2 (GCSE (A-C)), in the West Midlands.

Gender=2
FTPT=1
Region=5
QCF=7
Occupation=1172

The estimated result of pay=750.
The estimated result of employment=97.

Example 2

Male, Full-time, 'Artists', QCF6 (First Degree), in the East of England.

Gender=1
FTPT=1
Region=3
QCF=3
Occupation=3411

The estimated result of pay=556.
The estimated result of employment=247.

Example 3

Male, Part-time, 'Bar staff', QCF7 ('Other higher degree'), in London.

Gender=2
FTPT=2
Region=1
QCF=2
Occupation=9274

The estimated result of pay=155.
The estimated result of employment=741.

Use of age ratios

Result for a particular age

Example 1

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 850, the ratio for this example is: 0.93143. Age: ‘33’, Occupation: ‘3. Associate professional and technical occupations’, Gender: ‘1. Males’, FT/PT: ‘1. Full-time employees’, Qualification: ‘1. High’

age=33
occupation=3
gender=1
ft/pt=1
qualification=1
The Pay estimate result with these characteristics is: 791.63.

Example 2

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 169, the ratio for this example is 1.03717. Age: ‘50’, Occupation: ‘9. Elementary occupation’, Gender: ‘0. Females’, FT/PT: ‘2. Full-time & Part-time’, Qualification: ‘3. Low’

age=50
occupation=9
gender=0
ft/pt=2
qualification=3

The Pay estimate result with these characteristics is: 175.50.

Example 3

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 682, the ratio for this example is 0.43136. Age: ‘25’, Occupation: ‘1. Managers and senior officials’, Gender: ‘1. Males’, FT/PT: ‘1. Full-time employees’, Qualification: ‘3. Low’

age=25
occupation=1
gender=1
ft/pt=1
qualification=3

The Pay estimate result with these characteristics is: 294.27.

Example 4

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 147, the ratio for this example is 0.99202. Age: ‘60’, Occupation: ‘5. Skilled trades occupations’, Gender: ‘0. Females’, FT/PT: ‘0. Part-time employees’, Qualification: ‘2. Medium’

age=60
occupation=5
gender=0
ft/pt=0
qualification=2

The Pay estimate result with these characteristics is: 145.96.

Example 5 (4 Digit)

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 543, the ratio for this example is 1.11773. Age: ‘33’, Occupation: ‘3416 Arts officers, producers and directors’, Gender: ‘0. Females’, FT/PT: ‘2. Full-time & Part-time’, Qualification: ‘3. Low’

age=33
occupation=3416
gender=0
ft/pt=2
qualification=3

The Pay estimate result with these characteristics is: 606.55.

Example 6 (4 Digit)

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 952, the ratio for this example is 1.11480. Age: ‘50’, Occupation: ‘1131 Financial managers and directors’, Gender: ‘1. Males’, FT/PT: ‘1. Full-time employees ’, Qualification: ‘3. Low’

age=50
occupation=1131
gender=1
ft/pt=1
qualification=3

The Pay estimate result with these characteristics is: 1061.69.

Example 7 (4 Digit)

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 781, the ratio for this example is 0.99053. Age: ‘33’, Occupation: ‘4161 Office managers’, Gender: ‘1. Males’, FT/PT: ‘1. Full-time employees ’, Qualification: ‘1. High’

age=33
occupation=4161
gender=1
ft/pt=1
qualification=1

The Pay estimate result with these characteristics is: 773.29.

Example 8 (4 Digit)

Note: for the following example, the user will first need to extract the mean Pay from the Main Pay Data set with the same characteristics. The mean Pay for the example below is: 394, the ratio for this example is 1.09233. Age: ‘33’, Occupation: ‘3416 Arts officers, producers and directors’, Gender: ‘2. Males and females’, FT/PT: ‘2. Full-time & Part-time’, Qualification: ‘3. Low’

age=33
occupation=3416
gender=2
ft/pt=2
qualification=3

The Pay estimate result with these characteristics is: 430.61.

Use of "median deciles.xlsx"

Result for a particular decile

Example 1 (median & decile)

Take occupation 1131 'Chartered Secretaries', for example, to generate the median and decile pay for this occupation (1131), the user needs to first get the mean gross weekly pay from the main dataset and the sd/(log mean pay) ratio from the 'Median and deciles.xlsx' workbook and then calculate natural log mean pay and standard deviation.

Occupation 1131 (Chartered Secretaries) Example (Males & Females)

1131 - FT1131 - PTFT&PT
mean gross weekly pay 1394.38 441.74 1201.27
log mean pay 7.09 5.72 6.96
Median Pay 1200.14 303.44 1053.55
sd/mean ratio 0.10626 0.15537 0.12808
mean/median ratio 1.16185 1.45577 1.14022
SD 0.75 0.89 0.89

Estimated Median and Deciles

z scoreFTPTFT & PT
10th decile-1.28457.5297.38336.61
20th decile-0.84637.36143.92498.27
30th decile-0.52811.12191.22662.74
40th decile-0.25994.10243.03843.08
median0.001200.14303.441053.55
60th decile0.251448.87378.861316.54
70th decile0.521775.72481.511674.79
80th decile0.842259.85639.762227.62
90th decile1.283148.09945.583297.45

Limits on estimates by age

In principle the equations enable estimates to be computed for any age.

In practice it probably makes sense to restrict this to at least those of typical working age (.e.g. from 16+ to typical retirement age (e.g 65)).

For some occupations further restrictions might be sensible to avoid users requesting estimates for combinations of characteristics that are unlikely (e.g. Doctors age 17). However it is difficult to devise general rules that are always applicable.

For the present it is suggested that calculations are simply limited between 20 and 65.

Classifications and aggregations

As for employment - please follow hyperlink.

Rules for suppressing data or raising warning flags

The rules of thumb used are:

  1. If the numbers employed in a particular category / cell (defined by the 12 regions, gender, status, occupation, qualification and industry (75 categories)) are below 1,000 then a query should return “no reliable data available” and offer to go up a level of aggregation across one or more of the main dimensions (e.g. UK rather than region, some aggregation of industries rather than the 75 level, or SOC 2 digit rather than 4 digit). This information is held in the variable 'weight' in the Working Futures employment file ('WFDataOcc4Dig.csv') and in the Pay file ('ashe_pay_main.csv') 'Employment' (same in both files).
  2. If the numbers employed in a particular category / cell (defined as in 1.) are between 1,000 and 10,000 then a query should return the number but with a flag to say that this estimate is based on a relatively small sample size and if the user requires more robust estimates they should go up a level of aggregation across one or more of the main dimensions (as in 1).

This is done not only for any queries about Employment (including Replacement Demand calculations) but also for Pay and Hours. In the case of Pay and Hours the API interrogates the part of the database holding the employment numbers to do the checks, as in 1.and 2. above, but then reports the corresponding pay or hours values as appropriate.

The pay file provided (ashe_pay_main.csv) has columns Employment and PayBill. The former is the employment data that should be used as weights. This is intended to be “self-contained”. There is no need to use the separate Working Futures data upon which it is based. The employment weights used are a subset of the Working Futures employment time series data set, but contain data for just one year. They differ from the Working Futures data in various other minor respects (for example they omit Armed Forces). These data should be used as employment weights for anything to do with Pay or Hours in the ASHE section of the LMI for All database. The reference to Working Futures in the documentation is intended simply to explain how the weights have been derived.

Rounding of estimates

In order to avoid false impressions of precision the API rounds up the estimates before delivering the answer to any query. In the case of weekly pay any number should be rounded to the nearest ten pounds.

Closing Notes

data/specsheet_ashepay.txt · Last modified: 2019-09-03 14:32 by Luke Bosworth