Changelog
New experimental features of version 2.0
Installation
General description
ASW faultlines for multiple subgroups
Thatcher’s Fau (Thatcher et al., 2003)
Faultline Strength * Faultline Distance (Bezrukova et al., 2009)
Multiple correlations (van Knippenberg et al., 2011)
Subgroup Strength (Gibson & Vermeulen, 2003)
FLS (Shaw, 2004)
PMD_cat (Trezzini, 2008)
Faultlines based on Latent Class Cluster Analysis (Lawrence & Zyphur, 2011)
References

Package code by Andreas Glenz and Dominik Dilba, documentation by Bertolt Meyer

This is the supplemental material for:

Meyer, B. & Glenz, A. (2013). Team faultline measures: A computational comparison and a new approach to multiple subgroups. Organizational Research Methods, 16, 393-424. https://doi.org/10.1177/1094428113484970

Current Version: 2.10.2 Last update: 2023-02-23

Changelog

2.10.2 Fixed a bug introduced by R 4.0.0 that caused faultlines() to throw an error

2.10.1 New version for hosting/installing via gitHub

New experimental features of version 2.0

Ego-Faultlines: Determine faultlines from each group members individual perspective, by using the parameter i.level = TRUE.
Subgroup Homogeneity: Alternative method for silhouette-width calculation that accounts for outgroup homogeneity (parameter usesghomo = TRUE).
Attribute weights: Use weighed average of single-attribute faultlines as a new method to account for differences in attribute relevances (parameter by.attr = TRUE).

Installation

The package is now hosted on GitHub and can be installed accordingly:

library(devtools)
install_github("bertolt01/asw.cluster")

General description

The core function of the package is the faultlines() function. It expects a data frame (tibbles are not supported) passed with the data = parameter containing the members of one or more teams as rows and their diversity attributes that are used for calculating faultline strength as columns. Note that all columns of this data set will be used for calculating the faultline. Thus, this will most likely be a subset of a larger data frame containing more variables. If the data set contains more than one team, it must contain a column that specifies a team number or ID for each team member, thus indicating team membership. The name of this column must be passed to the argument group.par.

The attribute type (numeric or nominal) must be specified for each diversity attribute in the data frame. These attribute type definitions must be in the same order as the variables in the data frame. These attribute types are defined wi the parameter attr.typeFor example, if the data set contains the variables age (numeric in years), ethnicity (character factor), and gender (character factor), this parameter must be specified as attr.type = c("numeric", "nominal", "nominal")

Note that the faultline measures proposed by Shaw (2004) and Trezzini (2008) require that all attributes are nominal. Thus, prior to calculating diversity faultline strengths with these two methods, numeric attributes such as age must be recoded to factors with levels such as ‘young’, ‘middle-aged’, and ‘old’ and the attribute type of this variable must be set to nominal.

For faultline measures that can deal with numeric attributes (types "thatcher", "bezrukova", or "asw"), a weight for each diversity attribute must be specified with the attr.weight parameter. These weights indicate how strong a difference of 1 (in case of numeric attributes) or a different category (in case of nominal attributes) is fractured into the faultline. In the example case of age (in years), gender, and ethnicity, specifying this parameter as attr.weight = c(0.1, 1, 1) means that an age difference of ten years is equally weighted as a difference in gender, which is equally weighted as a difference in ethnicity. Note that these are the default values for Thatcher’s et al. (2003) Fau that are probably used in most papers, but these appear to be arbitrary. More research is required with regard to the choice of these weights in a given context. Note that the parameter rescale and the parameter attr.weight interact in such a way that numeric attributes are first rescaled according to rescale before all attributes (with dummy-coded values for nominal attributes) are multiplied by their appropriate weight given by the attr.weight parameter.

The method parameter specifies the type of faultline measure. Currently, methods "asw", "thatcher", "shaw", "bezrukova", "trezzini", "knippenberg", "lcca", "gibson" are implemented. See below for details.

The metric parameter specifies whether Euclidean or Mahalanobis distances should be employed in determining how different team members are from each other. This metric is only employed by the methods “thatcher”, “bezrukova”, and “asw”. Note that the former two methods (Bezrukova et al., 2009; Thatcher et al., 2003) were introduced based on Euclidean distances, which assume that diversity attributes are uncorrelated. Meyer and Glenz (2013) showed that correlations between diversity attributes (e.g., between age and tenure) can have a significant influence on diversity faultline measures. They thus suggested to employ Mahalanobis distances to control for such correlations. They explicitly included this option in the calculation of the ASW measure, but invoking it for Thatcher’s et al. Fau or for Bezrukova’s Faultline Distance measure is purely experimental. Employing Mahalanobis distances for the latter two measures will thus deliver a measure that has not been described in the literature. Furthermore, calculating Mahalanobis distances requires an inversion of the variance-covariance-martix of attributes. Using Mahalanobis metrics is therefore restricted to data sets with invertible variance-/covariance matrices, i.e., to numeric attributes only.

ASW faultlines for multiple subgroups

ASW is the only diversity faultline measure that is suitable for the case where more than two homogeneous subgroups are possible in a given team. Here is an example for how to calculate ASW for an example data set teamdata_sub. It consists of two teams with six members each with the diversity attributes age, gender, and ethnicity.

Note that it is important to set the variable types correctly: numeric for numeric attributes and factor for nominal-level attributes. Variables of type character will throw errors!

teamdata_sub <- data.frame(teamid = c(rep(1,6),rep(2,6)), 
  age = c(44,18,40,33,33,50,22,23,39,42,57,51), 
  gender = factor(c("f","m","f","f","m","f","f","f","m","m","m","m")), 
  ethnicity = factor(c("A","B","A","D","C","B","A","A","B","B","C","C")))

str(teamdata_sub)

## 'data.frame':    12 obs. of  4 variables:
##  $ teamid   : num  1 1 1 1 1 1 2 2 2 2 ...
##  $ age      : num  44 18 40 33 33 50 22 23 39 42 ...
##  $ gender   : Factor w/ 2 levels "f","m": 1 2 1 1 2 1 1 1 2 2 ...
##  $ ethnicity: Factor w/ 4 levels "A","B","C","D": 1 2 1 4 3 2 1 1 2 2 ...

teamdata_sub

##    teamid age gender ethnicity
## 1       1  44      f         A
## 2       1  18      m         B
## 3       1  40      f         A
## 4       1  33      f         D
## 5       1  33      m         C
## 6       1  50      f         B
## 7       2  22      f         A
## 8       2  23      f         A
## 9       2  39      m         B
## 10      2  42      m         B
## 11      2  57      m         C
## 12      2  51      m         C

The first team appears to be rather heterogeneous and cross-cut, but the second team appears to consist of three rather homogeneous subgroups. To calculate the ASW faultline measure for both teams, we need to specify the scales of the diversity attributes age, gender, and ethnicity as being numeric, nominal, and nominal. Instead of specifying the scale types in the call to the faultlines() function, they can also be stored in a variable that can be passed to the function:

my_attr <- c("numeric", "nominal", "nominal")

The ASW faultline algorithm also needs to know how to weigh the attributes, i.e., how much age difference is seen as equivalent to a difference in gender or ethnicity. Following the example in the introduction of this section, these can be stored in a variable as well:

my_weights <- c(0.1, 1, 1)

After these considerations have been made, the faultlines can be calculated with the results being stored in a data frame that we call my_ASW. Note how in the call to the faultline() function, the name of the data frame containing the demographic information and the name of the variable in that data frame specifying team membership are also passed as parameters:

library(asw.cluster)
my_ASW <- faultlines(data = teamdata_sub, 
                     group.par = "teamid", 
                     attr.type = my_attr, 
                     attr.weight = my_weights,
                     method = "asw")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_ASW

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.331799912553719      1 2 1 2 2 1                   2            3 3
## 2    2 0.805489497404053      1 1 2 2 3 3                   3          2 2 2

The resulting value is a data frame with each line representing a team. The first column team denotes the team number and the second, fl.value, denotes the faultline strength for the given group (the ASW value). The column mbr_to_subgroups indicates to which subgroup each member belongs. Members are listed left-to right with reference to the top-to-bottom order of the data frame containing the raw data. The column number_of_subgroups indicates how many subgroups the algorithm detected in the given team, and the last column subgroup_sizes lists the sizes of the subgroups.

Individual-level summary for merging with the original data frame

summary() gives a more detailed summary:

summary(my_ASW)

## Number of Teams: 2 
## 
## Calculation features: 
##  Method:      ASW 
##  Level:   team 
##  Metric:      euclid 
## 
## 
## Team 1 (1):
## ===========
## Faultline Strength:
## [1] 0.3317999
## 
## Individual Faultline Strengths (silhouette widths):
## [1]  0.6233902  0.3507280  0.4973520 -0.1024386  0.1353660  0.4864019
## 
## Member to Subgroup Association:
## [1] 1 2 1 2 2 1
## 
## Number of Subgroups:
## [1] 2
## 
## Distances:
##          X1       X2        X3        X4        X5        X6
## 1  0.000000 26.03843  4.000000 11.045361 11.090537  6.082763
## 2 26.038433  0.00000 22.045408 15.066519 15.033296 32.015621
## 3  4.000000 22.04541  0.000000  7.071068  7.141428 10.049876
## 4 11.045361 15.06652  7.071068  0.000000  1.414214 17.029386
## 5 11.090537 15.03330  7.141428  1.414214  0.000000 17.058722
## 6  6.082763 32.01562 10.049876 17.029386 17.058722  0.000000
## 
## 
## Team 2 (2):
## ===========
## Faultline Strength:
## [1] 0.8054895
## 
## Individual Faultline Strengths (silhouette widths):
## [1] 0.9570891 0.9555946 0.8343080 0.8094112 0.6892723 0.5872618
## 
## Member to Subgroup Association:
## [1] 1 1 2 2 3 3
## 
## Number of Subgroups:
## [1] 3
## 
## Distances:
##         X1       X2       X3        X4       X5        X6
## 1  0.00000  1.00000 17.05872 20.049938 35.02856 29.034462
## 2  1.00000  0.00000 16.06238 19.052559 34.02940 28.035692
## 3 17.05872 16.06238  0.00000  3.000000 18.02776 12.041595
## 4 20.04994 19.05256  3.00000  0.000000 15.03330  9.055385
## 5 35.02856 34.02940 18.02776 15.033296  0.00000  6.000000
## 6 29.03446 28.03569 12.04159  9.055385  6.00000  0.000000

The result of summary also contains an individual-level result where each row represents a team member for merging with the original data (teamdata_sub in this example):

my_ASW_long <- summary(my_ASW)$long
kable(my_ASW_long) %>% kable_styling()

team	teamsize	fl.value	fl.mbr	mbr_to_subgroups	number_of_subgroups	subgroup_size	own_subgroup_size
1	6	0.3317999	avg	1	2	3	NA
1	6	0.3317999	avg	2	2	3	NA
1	6	0.3317999	avg	1	2	3	NA
1	6	0.3317999	avg	2	2	3	NA
1	6	0.3317999	avg	2	2	3	NA
1	6	0.3317999	avg	1	2	3	NA
2	6	0.8054895	avg	1	3	2	NA
2	6	0.8054895	avg	1	3	2	NA
2	6	0.8054895	avg	2	3	2	NA
2	6	0.8054895	avg	2	3	2	NA
2	6	0.8054895	avg	3	3	2	NA
2	6	0.8054895	avg	3	3	2	NA

teamdata <- data.frame(teamdata_sub, my_ASW_long)
kable(teamdata) %>% kable_styling()

teamid	age	gender	ethnicity	team	teamsize	fl.value	fl.mbr	mbr_to_subgroups	number_of_subgroups	subgroup_size	own_subgroup_size
1	44	f	A	1	6	0.3317999	avg	1	2	3	NA
1	18	m	B	1	6	0.3317999	avg	2	2	3	NA
1	40	f	A	1	6	0.3317999	avg	1	2	3	NA
1	33	f	D	1	6	0.3317999	avg	2	2	3	NA
1	33	m	C	1	6	0.3317999	avg	2	2	3	NA
1	50	f	B	1	6	0.3317999	avg	1	2	3	NA
2	22	f	A	2	6	0.8054895	avg	1	3	2	NA
2	23	f	A	2	6	0.8054895	avg	1	3	2	NA
2	39	m	B	2	6	0.8054895	avg	2	3	2	NA
2	42	m	B	2	6	0.8054895	avg	2	3	2	NA
2	57	m	C	2	6	0.8054895	avg	3	3	2	NA
2	51	m	C	2	6	0.8054895	avg	3	3	2	NA

the own_subgroup column is only used for individual-level faultlines as described below.

Scaling attributes

To circumvent the issue of assigning arbitrary weights for attr.weight (e.g., a difference in ten years of age equals a difference in gender), Bezrukova et al. (2009) recommend to scale numeric attributes by their standard deviation, and to dummy code nominal attributes with 0 and 1/√2. The latter results in an Euclidean distance of one between nominal attributes. This scaling is used by default in all papers employing the Fau * Dist faultline measure, and we also recommend to employ this scaling when calculating ASW faultlines. The application of this scaling is illustrated in the following example:

my_ASW_sc <- faultlines(data = teamdata_sub,
                               group.par = "teamid",
                               attr.type = my_attr,
                               rescale = "sd",
                               method = "asw")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_ASW_sc

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.337558681138197      1 2 1 1 2 1                   2            4 2
## 2    2 0.821859114925443      1 1 2 2 3 3                   3          2 2 2

Note how the use of scaling makes the specification of weights unnecessary. Also note how the scaling of attributes leads to different results.^

Controlling for correlated attributes

ASW can control for the correlation between numeric attributes when calculating the faultline. In the following example, the attributes age and tenure are correlated:

mynumericdata <- data.frame(teamid = c(rep(1,6),rep(2,6)), 
                     age = c(44,18,40,33,33,50,22,23,39,42,57,51), 
                     tenure = c(12,2.5,11,3,5,5,2,1,12,13,20,22))

mynumericdata

##    teamid age tenure
## 1       1  44   12.0
## 2       1  18    2.5
## 3       1  40   11.0
## 4       1  33    3.0
## 5       1  33    5.0
## 6       1  50    5.0
## 7       2  22    2.0
## 8       2  23    1.0
## 9       2  39   12.0
## 10      2  42   13.0
## 11      2  57   20.0
## 12      2  51   22.0

with(mynumericdata, cor.test(age, tenure))

## 
##  Pearson's product-moment correlation
## 
## data:  age and tenure
## t = 4.5062, df = 10, p-value = 0.001132
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4614055 0.9473970
## sample estimates:
##       cor 
## 0.8185531

If this correlatio0n is not taken into account, it distorts the faultline strength. In this example, if the correlation is not accounted for, ASW assumes that the two attributes are uncorrelated and returns a stronger faultline strength for the second team than for the first team:

my_num_attr <- c("numeric", "numeric") 
my_num_weights <- c(1,1)

my_ASW <- faultlines(data = mynumericdata, 
                            group.par = "teamid", 
                            attr.type = my_num_attr, 
                            attr.weight = my_num_weights, 
                            method = "asw")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_ASW

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.468056940912543      1 2 1 2 2 1                   2            3 3
## 2    2   0.7792993906299      1 1 2 2 3 3                   3          2 2 2

However, the tenure of the members of the second team is pretty much in line with what one would expect given their age; only the team member whose age is 57 has a lower tenure than her college with 51. So there is actually a difference between the last two group members because for one of them, the tenure is not what one would expect given its strong correlation with age. Therefore, controlling for the correlation by employing the Mahalanobis metric in determining how similar people are, leads to a different solution:

my_ASW_m <- faultlines(data = mynumericdata, 
                       group.par = "teamid", 
                       attr.type = my_num_attr, 
                       attr.weight = my_num_weights, 
                       method = "asw", 
                       metric = "mahal")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_ASW_m

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.465347846418796      1 2 1 3 3 4                   4        2 1 2 1
## 2    2 0.338315622749952      1 1 2 2 3 4                   4        2 2 1 1

This solution is drastically different from the previous one: The number of subgroups changes and the second team now has a weaker faultline than the first team, whereas in the previous case, it was the other way around. This example illustrates the considerations that have to be made if diversity attributes are correlated.

Individual-level faultlines (experimental)

ASW team-level faultline strength is determined by maximizing the average average silhouette width across all team members (see Meyer & Glenz, 2014, for details). Specifying i.level = TRUE changes the algorithm to maximize for each individual team members’ individual silhouette width, which results in a unique subgroup configuration and faultline strength from each team members’ perspective.

Thus, a team with n members delivers n individual faultline solutions for i.level = TRUE. In addition to an n-sized vector of faultline strengths, i.level = TRUE also delivers an n x n adjacency matrix, indicating in each cell the portion of the n faultline configurations that assign the corresponding pair of team-members to the same subgroup.

Limitimg the maximum number of subgroups

The maxgroups parameter allows limiting the maximum number of subgroups that ASW can detect for each team. The default is 6. Only employed if method = "asw".

Thatcher’s Fau (Thatcher et al., 2003)

Fau (Thatcher et al., 2003) assumes the existence of two homogeneous subgroups. In the following, we show how to calculate it for the example data set teamdata_sub introduced above

As with ASW faultlines, the types of the diversity attributes age, gender, and ethnicity (i.e., numeric, nominal, and nominal) must be defined

my_attr <- c("numeric", "nominal", "nominal")

The Fau faultline algorithm also needs to know how to weigh the attributes, i.e., how much age difference is seen as equivalent to a difference in gender or ethnicity. Following the example in the introduction of this section, these can be stored in a variable as well:

my_weights <- c(0.1, 1, 1)

Calculatte Fau:

my_Fau <- faultlines(data = teamdata_sub, 
                     group.par = "teamid", 
                     attr.type = my_attr, 
                     attr.weight = my_weights,
                     method = "thatcher")

## Note: The selected method limits the Number of subgroups to 2 
## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_Fau

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.551343900758098      1 2 1 2 2 1                   2            3 3
## 2    2 0.774778652238072      1 1 2 2 2 2                   2            2 4

In the resulting data frame is formatted in the same way as in the example above: each line represents a team. The first column denotes the team number and the second, fl.value, the faultline strength (the Fau value in this case). Note that when calculating Fau, the number of subgroups is always fixed to 2 This may be inapropriate as in this case, as it clearly misrepresents the subgroup configuration of the second team. We thus discourage this algorithm for teams in which more than two subgroups are possible.

The summary() command works in the same way as described above.

Faultline Strength * Faultline Distance (Bezrukova et al., 2009)

Bezrukova et al., (2009) suggested multiplying Thatcher’s Fau for a given team with the Euclidean distance between the two subgroup centroids. Given that the Euclidean distance is already fractured into the calculation of Fau, Meyer and Glenz (2013) showed that this does not add more information to what Fau already delivers. We thus discourage using this faultline measure. To use it anyways, specify method = "bezrukova" as shown in this example:

my_bezrukova <- faultlines(data = teamdata_sub, 
                group.par = "teamid", 
                attr.type = my_attr, 
                attr.weight = my_weights, 
                method = "bezrukova")

## Note: The selected method limits the Number of subgroups to 2 
## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_bezrukova

##   team         fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 9.20192071095432      1 2 1 2 2 1                   2            3 3
## 2    2 19.2031432721388      1 1 2 2 2 2                   2            2 4

Note that when calculating the Faultline Strength * Faultline Distance measure, the number of subgroups is always fixed to 2. Also note how the multiplication of the Fau value with the Euclidean distance results in a value that is outside the range of 0 and 1.

The summary() command works in the same way as described above.

Multiple correlations (van Knippenberg et al., 2011)

Earlier versions of this package also included the measure by van Knippenberg et al. (2011) (invoked by method = "knippenberg"). It operationalizes faultline strength as the product of the multiple correlations between diversity attributes. It does not deliver the number of subgroups, nor a member-to-subgroup association. The measure also works with numeric attributes and thus also requires specifying attribute types as numeric or nominal like ASW.

We do not recommend using this measure because it has a built-in flaw: As it is the product of the correlations of diversity attributes, the measure is 0 if two attributes are uncorrelated, regardless of the correlations among the other attributes. We are unaware of an application of this measure outside of its original application.

my_knippenberg <- faultlines(data = teamdata_sub, 
                             group.par = "teamid", 
                             attr.type = my_attr, 
                             method = "knippenberg")

## Group: 1   Groupsize: 6 
## # weights:  7 (6 variable)
## initial  value 4.158883 
## iter  10 value 0.014438
## iter  20 value 0.000803
## iter  30 value 0.000215
## final  value 0.000090 
## converged
## # weights:  2 (1 variable)
## initial  value 4.158883 
## final  value 3.819085 
## converged
## # weights:  20 (12 variable)
## initial  value 8.317766 
## iter  10 value 2.177796
## iter  20 value 0.012237
## iter  30 value 0.000171
## final  value 0.000081 
## converged
## # weights:  8 (3 variable)
## initial  value 8.317766 
## final  value 7.977968 
## converged
## Group: 2   Groupsize: 6 
## # weights:  6 (5 variable)
## initial  value 4.158883 
## iter  10 value 0.007416
## iter  20 value 0.001247
## iter  30 value 0.000270
## iter  40 value 0.000164
## final  value 0.000090 
## converged
## # weights:  2 (1 variable)
## initial  value 4.158883 
## final  value 3.819085 
## converged
## # weights:  15 (8 variable)
## initial  value 6.591674 
## iter  10 value 0.137195
## iter  20 value 0.000580
## final  value 0.000057 
## converged
## # weights:  6 (2 variable)
## initial  value 6.591674 
## final  value 6.591674 
## converged

my_knippenberg

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.993492567338543               NA                  NA             NA
## 2    2 0.988676580130105               NA                  NA             NA

Subgroup Strength (Gibson & Vermeulen, 2003)

The measure by Gibson & Vermeulen (2003), invoked by method = "gibson", quantifies the extent to which attributes overlap between the dyads that can be formed between all members of a team. Although this is equivalent to finding latent subgroup separations, theis method does not reveal the boundaries of those subgroups, i.e. the member-to-subgroup association, nor does it provide an estimation of the number of subgroups.

As the measure by does not support the weighting of attributes, no weighting variable is required.

my_gibson <- faultlines(data = teamdata_sub, 
                        group.par = "teamid", 
                        attr.type = my_attr, 
                        method = "gibson")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_gibson

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.705047458357244               NA                  NA             NA
## 2    2  1.00337746934212               NA                  NA             NA

FLS (Shaw, 2004)

Shaw`s (2004) FLS quantifies the extent to which categorical attributes are aligned within subgroups, and deviate between subgroups. Thus, this measure is only suitable for categorical data and is thus not suitable for the example data set employed so far, because that contains the numeric variable age. Thus, Shaw’s FLS requires recoding the age variable to a nominal scale, e.g. by employing categories for certain age ranges. The following code produces another data set that is based on the previous example but categorized the age variable:

mycategorialdata <- data.frame(teamid = c(rep(1,6),rep(2,6)), 
                               age = factor(c("40 to 50","18 to 25","40 to 49","30 to 39","30 to 39",
                                       "50 to 59","18 to 25", "18 to 25","30 to 39",
                                       "40 to 49", "50 to 59","50 to 59")), 
                               gender = factor(c("f","m","f","f","m","f","f","f","m","m","m","m")), 
                               ethnicity = factor(c("A","B","A","D","C","B","A","A","B","B","C","C")))

Subsequently, FLS can be computed:

my_cat_attr <- c("nominal", "nominal", "nominal")

my_FLS <- faultlines(data = mycategorialdata, 
                     group.par = "teamid", 
                     attr.type = my_cat_attr, 
                     method = "shaw")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_FLS

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1         0.0546875               NA                  NA             NA
## 2    2 0.603205128205128               NA                  NA             NA

PMD_cat (Trezzini, 2008)

Trezzini (2008) operationalized faultline strength as the degree of polarized multi-dimensional subgroup diversity for categorial attributes. Thus, the measure is only suitable for categorical data. We thus use the mycategorialdata data frame created in the previous example on FLS. PMD_cat is invoked in a similar way as FLS:

my_PMD <- faultlines(data = mycategorialdata, 
                     group.par = "teamid", 
                     attr.type = my_cat_attr, 
                     method = "trezzini")

## Group: 1   Groupsize: 6 
## Group: 2   Groupsize: 6

my_PMD

##   team          fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1 0.216049382716049               NA                  NA             NA
## 2    2 0.339506172839506               NA                  NA             NA

Faultlines based on Latent Class Cluster Analysis (Lawrence & Zyphur, 2011)

Lawrence and Zyphur (2011) proposed latent class cluster analysis (LCCA), also referred to as latent class analysis (LCA), for identifying faultlines in a stepwise way. First, several latent class solutions with different clusters are obtained over the data of a given team, where the clusters represent the subgroups. Out of these possible latent cluster solutions, the best-fitting one is identified by the lowest Bayesian information criterion (BIC) value. Each team member is then assigned to a subgroup based on the posterior probabilities for a given individual to belong to a certain class. As high posterior probabilities are likely in the case of homogeneous clusters, the homogeneity of posterior probabilities of all group members, which is determined with the entropy measure, can be employed as a measure of faultline strength.

As Meyer and Glenz (2013) show, this measure has certain practical limitations when applied to small group data. Its largest limitation lies in its insensitivity to different levels of homogeneity. It is biased towards strong faultlines and often fails to converge for very homogeneous small subgroups as in this example. Its usefulness for small group data is therefore questionable.

LCCA-based faultlines can be calculated by invoking the method = “lcca” parameter. They do not require a scaling of attributes.

my_attr <- c("numeric", "nominal", "nominal")

my_lcca <- faultlines(data = teamdata_sub,
                             group.par = "teamid",
                             attr.type = my_attr,
                             method = "lcca")

## Group: 1   Groupsize: 6 
## 1 : * * * * * * * * * *
## 2 : * * * * * * * * * *
## 3 : * * * * * * * * * *
## 4 : * * * * * * * * * *
## 5 : * * * * * * * * * *
## 6 : * * * * * * * * * *
## Group: 2   Groupsize: 6 
## 1 : * * * * * * * * * *
## 2 : * * * * * * * * * *
## 3 : * * * * * * * * * *
## 4 : * * * * * * * * * *
## 5 : * * * * * * * * * *
## 6 : * * * * * * * * * *

my_lcca

##   team fl.value mbr_to_subgroups number_of_subgroups subgroup_sizes
## 1    1       NA               NA                  NA              1
## 2    2       NA               NA                  NA              2

References

Bezrukova, K., Jehn, K. A., Zanutto, E. L., & Thatcher, S. M. B. (2009). Do workgroup faultlines help or hurt? A moderated model of faultlines, team identification, and group performance. Organization Science, 20, 35-50. https://doi.org/10.1287/orsc.1080.0379

Gibson, C., & Vermeulen, F. (2003). A healthy divide: Subgroups as a stimulus for team learning behavior. Administrative Science Quarterly, 48, 202-239. https://doi.org/10.2307/3556657

Lawrence, B., & Zyphur, M. (2011). Identifying organizational faultlines with latent class cluster analysis. Organizational Research Methods, 14, 32-57. https://doi.org/10.1177/1094428110376838

Meyer, B. & Glenz, A. (2013). Team faultline measures: A computational comparison and a new approach to multiple subgroups. Organizational Research Methods, 16, 393–424. https://doi.org/10.1177/1094428113484970

Shaw, J. (2004). The development and analysis of a measure of group faultlines. Organizational Research Methods, 7, 66-100. https://doi.org/10.1177/1094428103259562

Thatcher, S., Jehn, K., & Zanutto, E. (2003). Cracks in diversity research: The effects of diversity faultlines on conflict and performance. Group Decision and Negotiation, 12, 217-241. https://doi.org/10.1023/A:1023325406946

Trezzini, B. (2008). Probing the group faultline concept: An evaluation of measures of patterned multi-dimensional group diversity. Quality and Quantity, 42, 339-368. https://doi.org/10.1007/s11135-006-9049-z

van Knippenberg, D., Dawson, J., West, M., & Homan, A. (2011). Diversity faultlines, shared objectives, and top management team performance. Human Relations, 64, 307-336. https://doi.org/10.1177/0018726710378384

Zanutto, E. L., Bezrukova, K., & Jehn, K. A. (2010). Revisiting faultline conceptualization: Measuring faultline strength and distance. Quality & Quantity, 45(3), 701-714. https://doi.org/10.1007/s11135-009-9299-7

asw.cluster : R Package for Faultline Algorithms

Andreas Glenz and Bertolt Meyer and Dominik Dilba