Why we merge OpenAlex author profiles
OpenAlex sometimes splits the same researcher across two or more author IDs. Author Trends tries to detect those duplicates and combine them into one profile so the trend charts reflect a full career β not a sliced one. Here's how the merge works and where it can still get things wrong.
The problem in one example
Imagine a fictional aerospace researcher named John Doe. He published his first paper in 2002 at the Indian Institute of Technology Bombay, then continued at the same institution. Around 2012 he moved to MIT and started using a slightly different byline: "J. Doe" rather than "John Doe". He also registered a second ORCID along the way without realising the first existed.
From OpenAlex's perspective there are now two distinct author records:
- A11111111 β "John Doe", ORCID 0000-0000-0000-0001, 112 works (2002β2014), affiliated mostly with IIT Bombay.
- A22222222 β "J. Doe", ORCID 0000-0000-0000-0002, 98 works (2012βpresent), affiliated mostly with MIT.
Neither profile alone tells the full story. A11111111 misses the MIT years; A22222222 misses the IIT Bombay years. Both profiles have low h-indices because their citation history is split. If a user clicks either profile, the country / institution / collaborator charts will be incomplete and the timeline will look like a researcher who suddenly stopped or suddenly started, with no continuity.
The heuristic
For every pair of search results returned by OpenAlex, Author Trends asks: do these look like the same person? We answer "yes" only when all three of these signals agree:
-
Name tokens overlap. We tokenize each display name,
drop initials (any token shorter than 3 characters) and check how
many remaining tokens are shared. We need at least 2.
John Doe β {john, doe}
J. Doe β {doe} β only one shared token. Hmm.
But if both bylines had been John Doe vs John A. Doe: {john, doe} β© {john, doe} = 2. Pass. - Affiliations overlap. Normalize each institution name (lowercase, drop "the", "of", "and", "university", etc.) and count how many are shared. We need at least one. In our John Doe example: both profiles list "Indian Institute of Technology Bombay" during the 2012 overlap year. Pass.
- Research concepts overlap. OpenAlex tags each work with concepts (Aerospace engineering, Mechanics, β¦). We need at least one shared concept across the two profiles. Both Does publish in Aerospace engineering. Pass.
Two profiles can also be merged by an identical ORCID alone β that's the strongest possible signal and it short-circuits the rules above.
Clustering is greedy single-linkage: if profile A merges with B and B merges with C, the three of them collapse into one cluster even if A and C don't independently pass the test.
What the merged profile looks like
Once a cluster is formed, the profile with the highest works_count becomes the "primary" β that ID is what the URL and the cache key use. Then:
- All affiliations are unioned and sorted newest-first. For John Doe the merged card now shows MIT 2012βpresent above IIT Bombay 2002β2014, with no gaps.
- Works are streamed from every member ID and deduplicated by work ID β so if the same paper appears under both profiles (e.g. during a transition year), it's only counted once.
- Stats are summed across members for works and citations. h-index is shown as the max across members (rigorously, h-index can't be summed, but a lower bound is more useful than the truncated number from a single sliced profile).
- A purple π Merged 2 profiles chip is always visible on the candidate card and the selected-author banner, with the source profile IDs in a tooltip. You can see at a glance whether the data you're looking at came from one OpenAlex record or several.
Where the merge can still fail
The heuristic is intentionally conservative β we'd rather show two separate profiles than wrongly fuse two different people who share a common name. But that means we sometimes miss real duplicates, especially when:
- The surname is very common and one of the profiles has only one long name token (e.g. L. Wang matched against Liu Wang fails the "β₯ 2 shared tokens" rule).
- The two profiles have no overlapping institution because one covers the very early career (PhD only) and the other starts after a long gap.
- The two profiles have no overlapping concept because the researcher made a career switch.
If you spot a case where the merge should have happened (or shouldn't have), please tell us through the feedback form β we use these reports to tune the heuristic.
And: if there are papers that you have not published wrongly in your profile, or if there are other OpenAlex profiles which contain your paper please contact OpenAlex directly to rectify it using this form. Author Trends just renders what OpenAlex returns β fixing the attribution at the source is the cleanest path.
A note on transparency
Every merged card carries an expandable "Source profiles" section listing each member ID, its ORCID, and its individual works count. Nothing is hidden behind the merge β if the result looks wrong, you can always click through to the original OpenAlex records and verify.