I have a set of data with 50K records of users (by email) and I need to choose only 10K of those records, by a predefined ratio of values in each category: Region, Role and Position.
For example, if I have the following sample of data (11 rows) how can I subset it to get 5 rows, split the following way:
- 80% AMER, 20% INDIA
- For each Role have 60% Sales and the rest would be at random
- For Position, get a split of 20% being Managers and 80% being Operational
Email Geo Role Position
[email protected] AMER Sales Manager
[email protected] AMER Sales Operational
[email protected] EMEA Sales Manager
[email protected] AMER Sales Operational
[email protected] EMEA Sales Manager
[email protected] AMER Testing Operational
[email protected] AMER Testing Manager
[email protected] INDIA Testing Manager
[email protected] INDIA Testing Operational
[email protected] AMER Marketing Operational
[email protected] INDIA Marketing Operational
I expect to get something like this:
Email Geo Role Position
[email protected] AMER Sales Manager
[email protected] AMER Sales Operational
[email protected] AMER Sales Operational
[email protected] AMER Testing Operational
[email protected] INDIA Marketing Operational
I'm aware that there will be more than one right solution, especially with more data, but any one is fine, as long as the predefined ratios are respected.