2

I am working in R. I have two vectors; A and B of lengths 5913 and 3733 respectively (with entries are repeated). I want to extract those values (with repetitions) that are present in both A and B. I have done (A %in% B) (lets call it C) and (B %in% A) (lets call it D). The length of C is 3906 and that of D is 3607 (so 2007 elements in A are not in B and 126 elements in B are not in A).

But How do I find the common values ? I don't think I can use intersect() method as it expects vectors with no duplicate values. I have many many repetitions.

Note: Due to the length of these vectors, I couldn't mention them here.

Any help would be greatly appreciated.

Thanks

user62198
  • 1,091
  • 4
  • 16
  • 32
  • Suppose A = c(1,1,2,2,3,3,4,4,5,5,6,6,9,11,15,17,19,19) and B = c(1,2,2,2,3,3,3,4,5,6,7,8,8,8) what will your result look like? One possibility is C=[1 1 2 2 3 3 4 4 5 5 6 6]. Another is D=[1 2 2 2 3 3 3 4 5 6]. Do you need E = [1 1 2 2 2 3 3 4 4 5 5 6 6]? – Biswajit Banerjee Dec 10 '15 at 23:29

3 Answers3

1

Since you have multiple repetitions in A and B, and these two vectors are of different lengths (5913 and 3733), it is expected that vector C and D are of different size. However, C and D might contain the same unique elements.

Let's take the example proposed by Biswajit Banerjee:

A = c(1,1,2,2,3,3,4,4,5,5,6,6,9,11,15,17,19,19)
B = c(1,2,2,2,3,3,3,4,5,6,7,8,8,8)

C= A[(A %in% B)]
D = B[(B %in% A)]

all(unique(C) == unique(D))
TRUE

You could verify that your vectors C and D of length 2007 and 126 contains the same unique values. Does it solve your problem ?

michaelg
  • 296
  • 1
  • 5
1

Does C <- intersect(unique(A), unique(B)) work?

Then you can do A[A %in% C] to get them from A with duplicates.

knb
  • 602
  • 5
  • 16
0

I think @biswajit-banerjee comments (and others) should be helpful.

When you say, How do I find the common values, i assume you are looking at either

  1. Values of A that exist in the other vector, or
  2. Values of B that exist in the other vector, or
  3. Unique values that exist in either vectors.

The third is tricky because you do not have vectors of equal lengths. Hence there is no index that could be mapped. So the option in that case should be the intersection of the unique values from each vector as @michaelg mentioned.

Here are a few ways depending on what you want to do:

list1 = c("item1", "item2", "item3", "item1", "item8", "item2")
list2 = c("item8", "item9", "item10", "item1")

index <- match(list1,list2) # solve 1
result1 <- list2[na.omit(index)]

index <- match(list2,list1) # solve 2
result2 <- list1[na.omit(index)]

unique(result1) # or unique(result2) to solve 3

Output for each case:

4 NA NA  4  1 NA # index
"item1" "item1" "item8" # list2[na.omit(index)]

5 NA NA  1 # index
"item8" "item1" # list1[na.omit(index)]

"item1" "item8" # result1
"item8" "item1" # result2
Drj
  • 427
  • 1
  • 7
  • 19