LMI For All

Documentation & Development

User Tools

Site Tools


data:specsheet_ashepay

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
data:specsheet_ashepay [2017-08-11 16:00]
Luke Bosworth
data:specsheet_ashepay [2019-09-03 14:32] (current)
Luke Bosworth
Line 8: Line 8:
 ===== Data File(s) ===== ===== Data File(s) =====
  
-This data load contains ​files, which are as follows:+This data load contains ​files, which are as follows:
  
 ^ File ^ Content ^ Size (bytes) ^ ^ File ^ Content ^ Size (bytes) ^
-Pay-20160422.asc | Average weekly pay including overtime | 173,064,542 +ashe_pay_main.csv | Average weekly pay including overtime | 164,739,426 
-| age regressions.csv | Parameters to generate estimates by age "on the fly" | 410,915 +ashe_age_coeffs.csv | Parameters to generate estimates by age "on the fly" | 26,596 | 
-Median and deciles.xls | Median and deciles estimates based on normal distribution of log (Pay), based on LFS, for  FT and PT separately | 118,413 |+| ashe_age_values.csv | Parameters to generate estimates by age "on the fly" | 1,132,​352 ​
 +ashe_median_deciles.csv | Median and deciles estimates based on normal distribution of log (Pay), based on LFS, for  FT and PT separately | 164,739,426 | 
 +| wf_occupations.csv | Occupational id, codes and descriptions ​ | 15,304 |
 ===== Source Dataset ===== ===== Source Dataset =====
  
-The Pay estimates are based on a combination of data from ASHE (Annual Survey of Hours and Earnings) and the Labour Force Survey (LFS). The most recent ASHE available to access in the Secure Data Lab was ASHE 2014 at the time of this updating work, thus ASHE data from 2013 and 2014 and LFS data from 2014 and 2015 were used for the generation of 2015 pay estimates. The assumption is that the estimated pay based on ASHE 2013 and 2014 is not hugely different from the estimated pay from ASHE 2014 and 2015.  Thanks are due to the Secure Data Service at the UK Data Archive for providing access to the Annual Survey of Hours and Earnings (ASHE) data to enable the econometric analysis on which these numbers are based.+The Pay estimates are based on a combination of data from ASHE (Annual Survey of Hours and Earnings) and the Labour Force Survey (LFS). The most recent ASHE available to access in the Secure Data Lab was ASHE 2017 at the time of this updating work, thus ASHE data from 2016 and 2017 and LFS data from 2017 and 2018 were used for the generation of 2018 pay estimates. The assumption is that the estimated pay based on ASHE 2016 and 2017 is not hugely different from the estimated pay from ASHE 2017 and 2018.  Thanks are due to the Secure Data Service at the UK Data Archive for providing access to the Annual Survey of Hours and Earnings (ASHE) data to enable the econometric analysis on which these numbers are based.
  
 ===Distinctions between Pay, Wage and Earnings:​=== ===Distinctions between Pay, Wage and Earnings:​===
Line 22: Line 24:
  
 ===General description of the Pay data=== ===General description of the Pay data===
-The use of “raw” data from the LFS or ASHE in the LMI for All data portal is limited due to sample size and concerns about confidentiality. Reliance on the “raw” data would result in huge gaps in the information available to be presented in the portal. To get around these limitations the portal uses “predicted Pay” estimates, based on an econometric analysis of the ASHE and LFS data sets. The econometric analysis uses a standard “Mincerian” Earnings Equation (the “main” earnings equation as described below). In order to provide additional details by age, as well as features of the distribution of Pay such as deciles, supplementary equations are also used. Full details of the approach can be found in Li and Wilson (2016). Compared to the LFS, ASHE has the advantages in that it has more reliable Pay information which is provided by employers rather than individuals and it has a larger sample size than LFS. However, information on individual characteristics is more limited in ASHE and it does not have any information on education or qualification. In order to get around these problems, the LMI for All database is based on a set of estimates/​predictions of Pay using data from both ASHE and LFS. The predicted Pay estimates in the '​PayFTPT.lasc'​ file are generated using the main earnings equation. These initial predictions are adjusted using an iterative RAS procedure to match the published Pay figures from ASHE and the LFS across each of the main dimensions/​characteristics (gender, region, industry, occupation and qualification). Again details can be found in Li and Wilson (2016). In order to generate predictions of Pay by age in the database supplementary age equations are estimated for full-time and part-time workers separately based on the assumption that the impact of various factors affect full-time and part-time workers in a different way.These results are then use to predict Pay by age based on the mean value of Pay for all ages. Similarly, predicted median and decile Pay levels are based on parametric methods and the assumption that Pay is log-normally distributed. There is no Pay data available for the occupation 'Armed Forces'​. There is, therefore, no employment data for 'Armed Forces'​ in the 'Pay-20160422.asc' file. However 'Armed Forces'​ employment is included in the main Working Futures employment data (WFDataOcc4Digit.csv).+The use of “raw” data from the LFS or ASHE in the LMI for All data portal is limited due to sample size and concerns about confidentiality. Reliance on the “raw” data would result in huge gaps in the information available to be presented in the portal. To get around these limitations the portal uses “predicted Pay” estimates, based on an econometric analysis of the ASHE and LFS data sets. The econometric analysis uses a standard “Mincerian” Earnings Equation (the “main” earnings equation as described below). In order to provide additional details by age, as well as features of the distribution of Pay such as deciles, supplementary equations are also used. Full details of the approach can be found in Li and Wilson (2016). Compared to the LFS, ASHE has the advantages in that it has more reliable Pay information which is provided by employers rather than individuals and it has a larger sample size than LFS. However, information on individual characteristics is more limited in ASHE and it does not have any information on education or qualification. In order to get around these problems, the LMI for All database is based on a set of estimates/​predictions of Pay using data from both ASHE and LFS. The predicted Pay estimates in the '​PayFTPT.lasc'​ file are generated using the main earnings equation. These initial predictions are adjusted using an iterative RAS procedure to match the published Pay figures from ASHE and the LFS across each of the main dimensions/​characteristics (gender, region, industry, occupation and qualification). Again details can be found in Li and Wilson (2016). In order to generate predictions of Pay by age in the database supplementary age equations are estimated for full-time and part-time workers separately based on the assumption that the impact of various factors affect full-time and part-time workers in a different way.These results are then use to predict Pay by age based on the mean value of Pay for all ages. Similarly, predicted median and decile Pay levels are based on parametric methods and the assumption that Pay is log-normally distributed. There is no Pay data available for the occupation 'Armed Forces'​. There is, therefore, no employment data for 'Armed Forces'​ in the 'ashe_pay_main.csv' file. However 'Armed Forces'​ employment is included in the main Working Futures employment data (WFDataOcc4Digit.csv).
  
-<​note>​The pay file provided (Pay-20160422.asc) includes columns Employment and PayBill. The former is the employment data that should be used as weights. This is intended to be “self-contained”. There is no need to use the separate //Working Futures// data upon which it is based. The employment weights used are a  subset of the //Working Futures// employment time series data set, but contain data for just one year. They differ from the //Working Futures// data in various other minor respects ​ (for example they omit Armed Forces).+<​note>​The pay file provided (ashe_pay_main.csv) includes columns Employment and PayBill. The former is the employment data that should be used as weights. This is intended to be “self-contained”. There is no need to use the separate //Working Futures// data upon which it is based. The employment weights used are a  subset of the //Working Futures// employment time series data set, but contain data for just one year. They differ from the //Working Futures// data in various other minor respects ​ (for example they omit Armed Forces).
  
 These data should be used as employment weights ​ for anything to do with Pay or Hours in the ASHE section of the LMI for All database. These data should be used as employment weights ​ for anything to do with Pay or Hours in the ASHE section of the LMI for All database.
Line 43: Line 45:
 Interactive terms are included to detect heterogeneity across different groups: Interactive terms are included to detect heterogeneity across different groups:
   * Gender by occupation: gender is interacted with 4-digit occupation categories to control Pay differences between male and female within each occupation. The base group is female Chief executives and senior officials.   * Gender by occupation: gender is interacted with 4-digit occupation categories to control Pay differences between male and female within each occupation. The base group is female Chief executives and senior officials.
-  * Industry by time trend: a time trend variable is created for 2014 and 2015 in the LFS analysis and for 2013 and 2014 in the ASHE analysis. It is interacted with industries to control time trend differences within each industry. The base groups are industries in 2014 in the LFS estimation and industries in 2013 in the ASHE estimation. +  * Industry by time trend: a time trend variable is created for 2017 and 2018 in the LFS analysis and for 2016 and 2017 in the ASHE analysis. It is interacted with industries to control time trend differences within each industry. The base groups are industries in 2017 in the LFS estimation and industries in 2016 in the ASHE estimation. 
-  * Occupation by time trend: the time trend is also interacted with occupations to control time trend differences within each occupation. The base groups are occupations in 2014 in the LFS analysis and occupations in 2013 in the ASHE analysis.+  * Occupation by time trend: the time trend is also interacted with occupations to control time trend differences within each occupation. The base groups are occupations in 2017 in the LFS analysis and occupations in 2016 in the ASHE analysis.
 Using ASHE data separate equations are first  estimated for full-time (FT) and part-time (PT) workers. Using ASHE data separate equations are first  estimated for full-time (FT) and part-time (PT) workers.
 A second set of equations are estimated using LFS data.  These include a  variable indicating the highest qualification held by the individual. ​ A second set of equations are estimated using LFS data.  These include a  variable indicating the highest qualification held by the individual. ​
-The estimated coefficients of the independent variables and the constant term are then used to derive the expected Pay for an individual with certain characteristics (as defined by the variables included). The default reference group in the LFS is female workers living in London with highest qualification QCF8 working in the Agriculture sector and are Chief executives or senior officials in 2014 and the default reference group in the ASHE is the same group of people in 2013. The log expected Pay for an individual with these default characteristics at certain age can be calculated by adding the following parts together: coefficient on age times age; coefficient on age square times age squared; plus the coefficient for the constant term. The calculation of log expected Pay for people with other characteristics is  simply ​ made by adding coefficients for relevant dummy variables and interaction terms to this default log expected Pay. For example, for a male worker with all the other same characteristics as default, his log expected Pay is the default log expected Pay plus the estimated coefficient of the male dummy. To obtain the expected Pay, the log numbers need to be converted back to Pay by exponentiating+The estimated coefficients of the independent variables and the constant term are then used to derive the expected Pay for an individual with certain characteristics (as defined by the variables included). The default reference group in the LFS is female workers living in London with highest qualification QCF8 working in the Agriculture sector and are Chief executives or senior officials in 2017 and the default reference group in the ASHE is the same group of people in 2016. The log expected Pay for an individual with these default characteristics at certain age can be calculated by adding the following parts together: coefficient on age times age; coefficient on age square times age squared; plus the coefficient for the constant term. The calculation of log expected Pay for people with other characteristics is  simply ​ made by adding coefficients for relevant dummy variables and interaction terms to this default log expected Pay. For example, for a male worker with all the other same characteristics as default, his log expected Pay is the default log expected Pay plus the estimated coefficient of the male dummy. To obtain the expected Pay, the log numbers need to be converted back to Pay by exponentiating
  
 <​code>​Pay=EXP (log expected Pay).</​code>​ <​code>​Pay=EXP (log expected Pay).</​code>​
  
-These predictions of pay by the various characteristics identified in the list of independent variables above form the basis for the initial estimates included in the file Pay-20160422.asc’ .The predictions are made for those of average age in the category concerned. ​+These predictions of pay by the various characteristics identified in the list of independent variables above form the basis for the initial estimates included in the file '​ashe_pay_main.csv’ .The predictions are made for those of average age in the category concerned. ​
 The estimates of the average age for each category or  combination are based on data from the  LFS. The corresponding ​  ​information based on ASHE is not available due to the disclosure risks of small sample size in each combination. An assumption of similar age distribution in LFS and ASHE is therefore made.  ​ The estimates of the average age for each category or  combination are based on data from the  LFS. The corresponding ​  ​information based on ASHE is not available due to the disclosure risks of small sample size in each combination. An assumption of similar age distribution in LFS and ASHE is therefore made.  ​
 ===Constraints to match published data=== ===Constraints to match published data===
-In order to ensure the final Pay estimates are consistent with the published data the detailed data base in Pay-20160422.asc’ is adjusted using RAS iterative techniques to match published official data. The Pay estimates multiplied by the corresponding employment numbers when summed match the corresponding ASHE totals at the 4 digit level occupation. For further details see Li and Wilson (2016).+In order to ensure the final Pay estimates are consistent with the published data the detailed data base in '​ashe_pay_main.csv’ is adjusted using RAS iterative techniques to match published official data. The Pay estimates multiplied by the corresponding employment numbers when summed match the corresponding ASHE totals at the 4 digit level occupation. For further details see Li and Wilson (2016).
  
 ===Estimates of Pay by Age=== ===Estimates of Pay by Age===
Line 99: Line 101:
   * PredictedPay is the predicted Pay for an individual of an age between 20 to 65 in the category of interest (a particular combination of characteristics defined by occupation at 1-digit level, gender, full-time or part-time worker, and three-category qualification).   * PredictedPay is the predicted Pay for an individual of an age between 20 to 65 in the category of interest (a particular combination of characteristics defined by occupation at 1-digit level, gender, full-time or part-time worker, and three-category qualification).
   * R is the ratio for: (predicted Pay of an age)/(mean Pay of all ages) in the same category.   * R is the ratio for: (predicted Pay of an age)/(mean Pay of all ages) in the same category.
-  * meanpay is the average (or mean) level of Pay for all age in the category of interest from the file 'Pay-20160422.asc'. (Note that the mean Pay provided in the file 'Pay-20160422.asc' are based on a combination of LFS and ASHE data, so they are slightly different from what weree used to estimate equation (1) which only use data from the LFS).+  * meanpay is the average (or mean) level of Pay for all age in the category of interest from the file 'ashe_pay_main.csv'. (Note that the mean Pay provided in the file 'ashe_pay_main.csv' are based on a combination of LFS and ASHE data, so they are slightly different from what weree used to estimate equation (1) which only use data from the LFS).
  
 16560 ratios are calculated for 360 combinations between age 20 and 65. For a query about Pay estimates, the four dimensions (occupation,​ gender, full-time or part-time and qualification) are used to identify the corresponding combination. Which ratio to use within this combination depends on which age the query is intended for. If the query does not involve these dimensions, for example a query about a particular industry or region, the ratio for “ALL” (or some other relevant sub-total) is used. 16560 ratios are calculated for 360 combinations between age 20 and 65. For a query about Pay estimates, the four dimensions (occupation,​ gender, full-time or part-time and qualification) are used to identify the corresponding combination. Which ratio to use within this combination depends on which age the query is intended for. If the query does not involve these dimensions, for example a query about a particular industry or region, the ratio for “ALL” (or some other relevant sub-total) is used.
  
 ====Estimates of Medians and Deciles==== ====Estimates of Medians and Deciles====
-In order to generate predictions of Pay medians and deciles, supplementary “distribution equations” are also used, based on analysis of both LFS and ASHE data. In this case the analysis is based on an assumption of Pay being log-normally distributed. The file Median and deciles.xlsx contains mean/median ratios and (standard deviation)/​mean ratios for full-time and part-time employees separately, based on analysis of data from ASHE. The estimates of the Median and deciles assume a log-normal distribution,​ using the mean and standard deviation estimated from the source data (LFS or ASHE). The formula used to compute median and other deciles based on the (log) normal distribution is referred to as the “distribution equation” and is set out in equation (2) and (2a):+In order to generate predictions of Pay medians and deciles, supplementary “distribution equations” are also used, based on analysis of both LFS and ASHE data. In this case the analysis is based on an assumption of Pay being log-normally distributed. The file '​ashe_median_deciles.csv' ​contains mean/median ratios and (standard deviation)/​mean ratios for full-time and part-time employees separately, based on analysis of data from ASHE. The estimates of the Median and deciles assume a log-normal distribution,​ using the mean and standard deviation estimated from the source data (LFS or ASHE). The formula used to compute median and other deciles based on the (log) normal distribution is referred to as the “distribution equation” and is set out in equation (2) and (2a):
  
 <​code>​Median Pay = Mean Pay * (1/r)    (2)</​code>​ <​code>​Median Pay = Mean Pay * (1/r)    (2)</​code>​
Line 126: Line 128:
 ([[http://​www.regentsprep.org/​regents/​math/​algtrig/​ATS7/​ZChart.htm|source ]]) ([[http://​www.regentsprep.org/​regents/​math/​algtrig/​ATS7/​ZChart.htm|source ]])
  
-The Median and deciles.xlsx file also contains estimates of typical values of the ratios of the standard deviation of log Pay to the mean value for each of the main categories of interest (4 digit occupations and status (FT/PT). The focus is on how median Pay (and other deciles) vary around mean Pay. This assumes that the Pay distributions are otherwise the same across other dimensions such as gender, industry, region, qualification,​ etc. The formula used to compute median and other deciles based on the (log) normal distribution is as shown in [[specsheet_ashepay#​Estimates of Medians and Deciles|Equation (2)]] above. Ideally estimates of σ are needed for all the main dimensions, but limitations of sample size in both ASHE and the LFS imply this is impossible for all possible permutations and combinations. Typical values are therefore assumed, based on variations across the main dimensions of interest (but not all possible cross dimensions). The focus is on variations by status (FT/PT) and occupation since inspection of the data suggests this is where the variations are greatest. Values of σ have therefore been estimated across these main dimensions and similar patterns are assumed to apply across all other dimensions for the purpose of this calculation. The assumption adopted is that the ratio of σ to the mean value is fixed across all the other dimensions. Using these ratios (which are differentiated by 4 digit occupation and full-time or part-time status) values of σ are generated within the API for all possible permutations and combinations. These estimated standard deviations are then used “on the fly” to create the prediction of median Pay and other deciles from the mean values extracted from 'Pay-20160422.asc'. +The '​ashe_median_deciles.csv' ​file also contains estimates of typical values of the ratios of the standard deviation of log Pay to the mean value for each of the main categories of interest (4 digit occupations and status (FT/PT). The focus is on how median Pay (and other deciles) vary around mean Pay. This assumes that the Pay distributions are otherwise the same across other dimensions such as gender, industry, region, qualification,​ etc. The formula used to compute median and other deciles based on the (log) normal distribution is as shown in [[specsheet_ashepay#​Estimates of Medians and Deciles|Equation (2)]] above. Ideally estimates of σ are needed for all the main dimensions, but limitations of sample size in both ASHE and the LFS imply this is impossible for all possible permutations and combinations. Typical values are therefore assumed, based on variations across the main dimensions of interest (but not all possible cross dimensions). The focus is on variations by status (FT/PT) and occupation since inspection of the data suggests this is where the variations are greatest. Values of σ have therefore been estimated across these main dimensions and similar patterns are assumed to apply across all other dimensions for the purpose of this calculation. The assumption adopted is that the ratio of σ to the mean value is fixed across all the other dimensions. Using these ratios (which are differentiated by 4 digit occupation and full-time or part-time status) values of σ are generated within the API for all possible permutations and combinations. These estimated standard deviations are then used “on the fly” to create the prediction of median Pay and other deciles from the mean values extracted from 'ashe_pay_main.csv'. 
-The ratios of σ to the mean value are provided in the Median and deciles.xlsx  file. Ratios for totals and sub-totals of occupations at 1-digit, 2-digit, 3-digit and 4-digit occupation levels and full-time or part-time status are also calculated. If the query does not involve occupation or full-time/​part-time dimensions, the ratio for “ALL” ” (or some other relevant sub-total) should be used.+The ratios of σ to the mean value are provided in the '​ashe_median_deciles.csv' ​ file. Ratios for totals and sub-totals of occupations at 1-digit, 2-digit, 3-digit and 4-digit occupation levels and full-time or part-time status are also calculated. If the query does not involve occupation or full-time/​part-time dimensions, the ratio for “ALL” ” (or some other relevant sub-total) should be used.
   ​   ​
  
Line 135: Line 137:
 ===== Fields and Columns ===== ===== Fields and Columns =====
  
-===Mean Pay by SOC2010 4-digit category (Pay-20160422.asc) === +===Mean Pay by SOC2010 4-digit category (ashe_pay_main.csv) === 
-The Pay-20160422.asc file contains information on the mean pay for those in each of the  combinations ​ defined by the following dimensions:​ +The '​ashe_pay_main.csv' ​file contains information on the mean pay for those in each of the  combinations ​ defined by the following dimensions:​ 
-  * year - (2015)+  * year - (2018)
   * gender - 2 (male and female)   * gender - 2 (male and female)
   * status - 2 (full-time and part-time)   * status - 2 (full-time and part-time)
Line 150: Line 152:
  
 Mean pay is PayBill divided by Employment. Mean pay is PayBill divided by Employment.
-Aggregating involves summing these two columns separately and and dividing the results to get mean pay.+Aggregating involves summing these two columns separately and and dividing the results to get mean pay.
  
-==== age ratios.xlsx ==== +====ashe_age_coeffs.csv/​ ashe_age_values.csv==== 
-In order to generate predictions of Pay by age and provide an indication of how Pay of each age category varies from mean Pay, “supplementary age equations” have been estimated. These enable typical ratios of Pay of a particular age category compared to mean Pay value of all ages for particular combinations to be calculated. ​ This is done for four main dimensions including occupation, gender, full-time or part-time working and the highest level of qualification. Pay of a particular age category is predicted using the parameters from a regression. The file age ratios.xlsx contains the ratios necessary to compute Pay by age for different combinations based on ASHE.+In order to generate predictions of Pay by age and provide an indication of how Pay of each age category varies from mean Pay, “supplementary age equations” have been estimated. These enable typical ratios of Pay of a particular age category compared to mean Pay value of all ages for particular combinations to be calculated. ​ This is done for four main dimensions including occupation, gender, full-time or part-time working and the highest level of qualification. Pay of a particular age category is predicted using the parameters from a regression. The file '​ashe_age_coeffs.csv'​ & '​ashe_age_values.csv' ​contains the ratios necessary to compute Pay by age for different combinations based on ASHE.
  
  
-===Median and deciles.xlsx===+===ashe_median_deciles.csv===
 In order to generate predictions of pay medians and deciles, ​ “Supplementary” equations are again used, ([[specsheet_ashepay#​Estimates of Medians and Deciles|see equation (2) above]]). The file contains parameters necessary to compute median and deciles, for full-time and part-time employees separately, based on LFS data. The gross weekly pay is (naturally) logged to be transformed into a normal distribution.  ​ In order to generate predictions of pay medians and deciles, ​ “Supplementary” equations are again used, ([[specsheet_ashepay#​Estimates of Medians and Deciles|see equation (2) above]]). The file contains parameters necessary to compute median and deciles, for full-time and part-time employees separately, based on LFS data. The gross weekly pay is (naturally) logged to be transformed into a normal distribution.  ​
  
-The 'Median and deciles.xlsx' file contains: +The 'ashe_median_deciles.csv' file contains: 
-  * z scores (as in the table above) ​in the ‘info’ sheet+  * z scores (as in the table above)
   * σ / mean ratios for FT, PT and occupations ​   * σ / mean ratios for FT, PT and occupations ​
   * Mean/median ratios for FT, PT, Males, Females and occupations   * Mean/median ratios for FT, PT, Males, Females and occupations
  
-The Mean/median ratios, σ / mean ratios and z, together with the mean values, allow generation of estimates of medians and deciles across the four main dimensions including occupations,​ gender, full-time/​part-time and qualifications. Note the need to make the log transformation. The data in the //Pay-20160422.asc// file are NOT logged.+The Mean/median ratios, σ / mean ratios and z, together with the mean values, allow generation of estimates of medians and deciles across the four main dimensions including occupations,​ gender, full-time/​part-time and qualifications. Note the need to make the log transformation. The data in the //ashe_pay_main.csv// file are NOT logged.
  
 =====Output===== =====Output=====
-The Pay-20160422.asc file contains predictions/​estimates for mean Pay for each of the 369 SOC2010 Unit Groups, broken down by the other main dimensions (including industry, region, gender and qualification). These predictions or estimates are as described in the subsection above on General description of the Pay data. The estimates are based on a combination of LFS and ASHE data. They include separate estimates for both full-time (FT) and part-time (PT) employees.+The '​ashe_pay_main.csv' ​file contains predictions/​estimates for mean Pay for each of the 369 SOC2010 Unit Groups, broken down by the other main dimensions (including industry, region, gender and qualification). These predictions or estimates are as described in the subsection above on General description of the Pay data. The estimates are based on a combination of LFS and ASHE data. They include separate estimates for both full-time (FT) and part-time (PT) employees.
  
 ====Queries and calculations==== ====Queries and calculations====
-===Pay Data Specification (Pay-20160422.asc)===+===Pay Data Specification (ashe_pay_main.csv)===
 The first column is the year. The first column is the year.
 The second to seventh cover gender, status( FT/PT), industry, occupation, geography and highest qualification held. These show the  characteristics of people covered by the dataset. The second to seventh cover gender, status( FT/PT), industry, occupation, geography and highest qualification held. These show the  characteristics of people covered by the dataset.
Line 252: Line 254:
 There are two possibilities:​ There are two possibilities:​
   - Generate an estimate by age on the fly using the  ratios provided in the age ratios.xlsx file and then apply the distribution equation;   - Generate an estimate by age on the fly using the  ratios provided in the age ratios.xlsx file and then apply the distribution equation;
-  - Apply the distribution equation first and then the ratios from age ratios.xlsx file.+  - Apply the distribution equation first and then the ratios from age ratios file.
    
 There is no obviously correct approach. However, the age ratios are based on mean values, so it is not certain that it will generate sensible results if applied to median or decile values (Option 2). The distribution equation is designed to focus on patterns around the mean, so Option 1 is regarded as preferable. There is no obviously correct approach. However, the age ratios are based on mean values, so it is not certain that it will generate sensible results if applied to median or decile values (Option 2). The distribution equation is designed to focus on patterns around the mean, so Option 1 is regarded as preferable.
Line 283: Line 285:
 The estimated result of pay=556.\\ The estimated result of pay=556.\\
 The estimated result of employment=247.\\ The estimated result of employment=247.\\
 +
 ====Example 3==== ====Example 3====
 Male, Part-time, 'Bar staff',​ QCF7 ('​Other higher degree'​),​ in London. Male, Part-time, 'Bar staff',​ QCF7 ('​Other higher degree'​),​ in London.
Line 453: Line 456:
 ===Rules for suppressing data or raising warning flags=== ===Rules for suppressing data or raising warning flags===
 The rules of thumb used are: The rules of thumb used are:
-  - If the numbers employed in a particular category / cell (defined by the 12 regions, gender, status, occupation, qualification and industry (75 categories)) are below 1,000 then a query should return “no reliable data available” and offer to go up a level of aggregation across one or more of the main dimensions (e.g. UK rather than region, some aggregation of industries rather than the 75 level, or SOC 2 digit rather than 4 digit). This information is held in the variable '​weight'​ in the Working Futures employment file ('​WFDataOcc4Dig.csv'​) and in the Pay file ('Pay-20160422.asc') '​Employment'​ (same in both files).+  - If the numbers employed in a particular category / cell (defined by the 12 regions, gender, status, occupation, qualification and industry (75 categories)) are below 1,000 then a query should return “no reliable data available” and offer to go up a level of aggregation across one or more of the main dimensions (e.g. UK rather than region, some aggregation of industries rather than the 75 level, or SOC 2 digit rather than 4 digit). This information is held in the variable '​weight'​ in the Working Futures employment file ('​WFDataOcc4Dig.csv'​) and in the Pay file ('ashe_pay_main.csv') '​Employment'​ (same in both files).
   - If the numbers employed in a particular category / cell (defined as in 1.) are between 1,000 and 10,000 then a query should return the number but with a flag to say that this estimate is based on a relatively small sample size and if the user requires more robust estimates they should go up a level of aggregation across one or more of the main dimensions (as in 1).   - If the numbers employed in a particular category / cell (defined as in 1.) are between 1,000 and 10,000 then a query should return the number but with a flag to say that this estimate is based on a relatively small sample size and if the user requires more robust estimates they should go up a level of aggregation across one or more of the main dimensions (as in 1).
 This is done not only for any queries about Employment (including Replacement Demand calculations) but also for Pay and Hours. This is done not only for any queries about Employment (including Replacement Demand calculations) but also for Pay and Hours.
 In the case of Pay and Hours the API interrogates the part of the database holding the employment numbers to do the checks, as in 1.and 2. above, but then reports the corresponding pay or hours values as appropriate. In the case of Pay and Hours the API interrogates the part of the database holding the employment numbers to do the checks, as in 1.and 2. above, but then reports the corresponding pay or hours values as appropriate.
  
-<​note>​The pay file provided (Pay-20160422.asc) has columns Employment and PayBill. The former is the employment data that should be used as weights. This is intended to be “self-contained”. There is no need to use the separate Working Futures data upon which it is based. The employment weights used are a  subset of the Working Futures employment time series data set, but contain data for just one year. They differ from the Working Futures data in various other minor respects ​ (for example they omit Armed Forces). These data should be used as employment weights ​ for anything to do with Pay or Hours in the ASHE section of the LMI for All database. The reference to Working Futures in the documentation is intended simply to explain how the weights have been derived.</​note>​+<​note>​The pay file provided (ashe_pay_main.csv) has columns Employment and PayBill. The former is the employment data that should be used as weights. This is intended to be “self-contained”. There is no need to use the separate Working Futures data upon which it is based. The employment weights used are a  subset of the Working Futures employment time series data set, but contain data for just one year. They differ from the Working Futures data in various other minor respects ​ (for example they omit Armed Forces). These data should be used as employment weights ​ for anything to do with Pay or Hours in the ASHE section of the LMI for All database. The reference to Working Futures in the documentation is intended simply to explain how the weights have been derived.</​note>​
  
 ===Rounding of estimates=== ===Rounding of estimates===
 In order to avoid false impressions of precision the API rounds up the estimates before delivering the answer to any query. In the case of weekly pay any number should be rounded to the nearest ten pounds. In order to avoid false impressions of precision the API rounds up the estimates before delivering the answer to any query. In the case of weekly pay any number should be rounded to the nearest ten pounds.
 ===Closing Notes=== ===Closing Notes===
data/specsheet_ashepay.1502467222.txt.gz · Last modified: 2017-08-11 16:00 by Luke Bosworth