OpenSearch 이중 Nested 구조 집계 경험 정리

Notice

Recent Posts

Recent Comments

Link

« 2026/04 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

oguri's garage

OpenSearch 이중 Nested 구조 집계 경험 정리 본문

개발하다/OpenSearch

OpenSearch 이중 Nested 구조 집계 경험 정리

oguri 2025. 10. 13. 20:23

들어가며

이 글은 OpenSearch에서 Nested Aggregation과 Reverse Nested의 동작 원리, 그리고 실제 프로젝트에서 이중 nested 구조를 다루며 겪은 시행착오와 해결 과정을 기록한다.

핵심 정리

Nested Aggregation

목적: 배열 내부 객체 간의 관계를 유지하며 집계
동작: 각 배열 요소를 숨겨진 문서로 저장하여 독립성 보장
사용 시기: author-rating, name-price 같은 관계가 중요할 때

Reverse Nested

목적: Nested 내부에서 상위 문서 필드에 접근
동작: Nested 범위를 벗어나 메인 문서 레벨로 복귀
사용 시기: Nested 집계 후 상위 문서 정보가 필요할 때

이중 Nested + Reverse Nested

복잡도: 매우 높음, 가능하면 피하는 것이 좋음
필수 요소:
1. 첫 번째 nested 진입
2. 두 번째 nested 진입
3. 필요한 집계 수행
4. Reverse nested로 상위 문서 복귀
대안 고려: 데이터 모델 재설계, denormalization

1. Nested 타입이 필요한 이유

1.1 일반 배열의 한계: Flatten 문제

OpenSearch는 기본적으로 배열을 "평평하게(flatten)" 저장한다.

이게 무슨 의미인지 예를 들어보자.

{
  "product": "Laptop",
  "reviews": [
    {"author": "Alice", "rating": 5},
    {"author": "Bob", "rating": 3}
  ]
}

일반 필드로 저장하면 OpenSearch 내부에서는 이렇게 인덱싱된다:

{
  "product": "Laptop",
  "reviews.author": ["Alice", "Bob"],
  "reviews.rating": [5, 3]
}

문제점: author와 rating의 관계가 완전히 깨진다. 이제 "Alice가 3점을 준 리뷰"를 찾는 쿼리도 이 문서를 반환한다. Alice는 실제로 5점을 줬는데도 말이다.

1.2 Nested 타입의 해결책

Nested 타입은 각 배열 요소를 독립적인 숨겨진 문서(hidden document)로 저장한다:

// 메인 문서
{
  "product": "Laptop"
}

// 숨겨진 nested 문서 1
{
  "reviews.author": "Alice",
  "reviews.rating": 5
}

// 숨겨진 nested 문서 2
{
  "reviews.author": "Bob",
  "reviews.rating": 3
}

이제 author와 rating의 관계가 보존된다. "Alice가 3점을 준 리뷰"를 찾아도 이 문서는 반환되지 않는다.

공식 문서: OpenSearch Nested Field Type

2. Nested Aggregation의 동작 원리

2.1 쿼리 생성: "이렇게 집계해줘"

Nested aggregation은 두 단계로 이해할 수 있다:

SearchRequest request = SearchRequest.of(s -> s
    .index("blog_posts")
    .size(0)
    .aggregations("comments_analysis", a -> a
        // 1단계: nested 필드로 "진입"
        .nested(n -> n.path("comments"))

        // 2단계: nested 문서들에 대한 집계 정의
        .aggregations("by_author", sub -> sub
            .terms(t -> t.field("comments.author"))
        )
    )
);

핵심: .nested(n -> n.path("comments"))는 "이제부터 comments 배열의 숨겨진 문서들을 대상으로 작업하겠다"는 선언이다.

2.2 결과 처리: "계산된 결과 꺼내기"

SearchResponse<Void> response = client.search(request, Void.class);

// 1단계: nested aggregation 결과 접근
Map<String, Aggregate> aggs = response.aggregations();
Aggregate commentsAgg = aggs.get("comments_analysis");

// 2단계: nested() 메소드로 NestedAggregate 타입으로 변환
NestedAggregate nestedResult = commentsAgg.nested();
long totalNestedDocs = nestedResult.docCount(); // 전체 댓글(nested 문서) 수

// 3단계: 하위 집계 결과 접근
Map<String, Aggregate> subAggs = nestedResult.aggregations();
List<StringTermsBucket> authorBuckets = subAggs
    .get("by_author")
    .sterms()
    .buckets()
    .array();

// 4단계: 결과 활용
for (StringTermsBucket bucket : authorBuckets) {
    System.out.println(bucket.key() + ": " + bucket.docCount() + "개 댓글");
}

2.3쿼리 생성 vs 결과 처리의 차이

구분	쿼리 생성 시	결과 처리 시
`.aggregations()`	`.aggregations(name, agg)` - 집계 정의 추가	`.aggregations()` - 계산된 결과 맵 가져오기
역할	"이렇게 계산해줘" (명령)	"계산 결과 어디있어?" (조회)
파라미터	2개 (이름, 집계 객체)	없음 (getter)
비유	레시피 작성	완성된 요리 꺼내기

이 차이를 이해하지 못하면 코드가 혼란스러워진다. 처음에 나도 "왜 .aggregations()를 두 번 쓰는데 파라미터가 다르지?"라고 헷갈렸다.

공식 문서: OpenSearch Nested Aggregation

3. Reverse Nested: Nested에서 탈출하기

3.1 문제 상황

Nested 내부로 들어가면 그 범위에 갇힌다. 상위 문서의 필드에 접근할 수 없다.

// 이 코드는 동작하지 않는다
SearchRequest request = SearchRequest.of(s -> s
    .index("blog_posts")
    .aggregations("comments_agg", a -> a
        .nested(n -> n.path("comments"))
        .aggregations("by_author", sub -> sub
            .terms(t -> t.field("comments.author"))
            .aggregations("post_category", cat -> cat
                .terms(t -> t.field("category"))  // ❌ 접근 불가!
            )
        )
    )
);

왜? 지금 우리는 "댓글(nested 문서)" 범위에 있는데, category는 "게시글(메인 문서)" 필드이기 때문이다.

3.2 Reverse Nested의 해결책

Reverse Nested는 nested 범위에서 상위 문서 레벨로 탈출하는 집계다.

SearchRequest request = SearchRequest.of(s -> s
    .index("blog_posts")
    .aggregations("vip_analysis", a -> a
        // 1. comments nested로 진입
        .nested(n -> n.path("comments"))
        .aggregations("vip_filter", sub -> sub
            // 2. VIP 댓글만 필터링
            .filter(f -> f.term(t -> t.field("comments.is_vip").value(true)))
            .aggregations("back_to_post", reverse -> reverse
                // 3. 게시글 레벨로 복귀 (Reverse Nested)
                .reverseNested(rn -> rn)
                .aggregations("categories", cat -> cat
                    // 4. 이제 게시글의 category 필드 접근 가능
                    .terms(t -> t.field("category"))
                )
            )
        )
    )
);

3.3 Reverse Nested 결과 처리

SearchResponse<Void> response = client.search(request, Void.class);

// VIP 댓글이 달린 게시글의 카테고리 분포
List<StringTermsBucket> categories = response.aggregations()
    .get("vip_analysis").nested()              // nested 결과
    .aggregations().get("vip_filter").filter()  // filter 결과
    .aggregations().get("back_to_post").reverseNested()  // 게시글 레벨로 복귀
    .aggregations().get("categories").sterms()  // 카테고리 집계
    .buckets().array();

// 결과 해석
for (StringTermsBucket bucket : categories) {
    // "VIP 댓글이 달린 Tech 게시글은 5개"
    System.out.println(bucket.key() + " 게시글: " + bucket.docCount() + "개");
}

핵심: Reverse Nested는 "왕복 티켓"이다. Nested로 들어갔다가(path("comments")), 다시 나오는(reverseNested()) 것이다.

공식 문서: OpenSearch Reverse Nested

4. 이중 Nested 구조: 복잡도의 증가

4.1 실제 문제 상황

내가 마주한 데이터 구조는 이랬다: (실제 데이터 예시는 아니고 비슷한 구조로 대체했다)

{
  "order_id": "ORDER-001",
  "customer_region": "Seoul",
  "products": [                        // Nested 1
    {
      "product_name": "Laptop",
      "category": "Electronics",
      "reviews": [                      // Nested 2 (이중 nested)
        {
          "rating": 5,
          "review_type": "verified",
          "comment": "Great product"
        }
      ]
    }
  ]
}

요구사항: "각 리뷰 평점(reviews.rating)별로 어느 지역(customer_region)의 고객이 많이 주문했나?"

문제사항 :

리뷰 정보는 products.reviews (이중 nested) 안에 있다
지역 정보는 최상위 문서에 있다
Nested 2단계를 거쳐 들어간 후, 다시 최상위로 나와야 한다

4.2 시행착오 1: 단일 Nested만 사용

// ❌ 실패
.nested(n -> n.path("products"))
.aggregations("ratings", sub -> sub
    .terms(t -> t.field("products.reviews.rating"))  // 접근 불가
)

실패 이유: reviews도 nested이므로 한 번 더 nested aggregation이 필요하다.

4.3시행착오 2: 이중 Nested만 사용

// ❌ 실패
.nested(n -> n.path("products"))
.aggregations("product_reviews", sub -> sub
    .nested(n -> n.path("products.reviews"))
    .aggregations("by_rating", rating -> rating
        .terms(t -> t.field("products.reviews.rating"))
        .aggregations("regions", region -> region
            .terms(t -> t.field("customer_region"))  // 접근 불가!
        )
    )
)

실패 이유: 이중 nested 내부에서 최상위 문서의 customer_region에 접근할 수 없다.

4.4 해결: Reverse Nested 활용

public Aggregation createRatingRegionAggregation() {
    return Aggregation.of(a -> a
        // 1. products nested 진입
        .nested(n -> n.path("products"))
        .aggregations("product_reviews", product -> product

            // 2. products.reviews nested 진입 (이중 nested)
            .nested(n -> n.path("products.reviews"))
            .aggregations("verified_reviews", review -> review

                // 3. 검증된 리뷰만 필터링
                .filter(f -> f.term(t -> t
                    .field("products.reviews.review_type")
                    .value("verified")
                ))
                .aggregations("rating_terms", rating -> rating

                    // 4. 평점별 그룹화 (예: 5점, 4점, 3점...)
                    .terms(t -> t
                        .field("products.reviews.rating")
                        .size(10)
                    )
                    .aggregations("region_agg", region -> region

                        // 5. 최상위 문서로 복귀 (핵심!)
                        .reverseNested(rn -> rn)
                        .aggregations("region_terms", regionTerms -> regionTerms

                            // 6. 지역별 집계 (이제 접근 가능)
                            .terms(t -> t
                                .field("customer_region")
                                .size(50)
                            )
                        )
                    )
                )
            )
        )
    );
}

4.5 결과 처리 코드

public List<RatingStats> processResults(SearchResponse<Order> response) {
    return response.aggregations()
        // 1단계: products nested 결과
        .get("rating_aggregation").nested()
        .aggregations()

        // 2단계: products.reviews nested 결과
        .get("product_reviews").nested()
        .aggregations()

        // 3단계: filter 결과
        .get("verified_reviews").filter()
        .aggregations()

        // 4단계: 평점별 버킷들
        .get("rating_terms").sterms()
        .buckets().array()
        .stream()
        .map(ratingBucket -> {
            int rating = Integer.parseInt(ratingBucket.key());
            long count = ratingBucket.docCount();

            // 5단계: 각 평점에 대한 지역 통계
            List<RegionStats> regions = ratingBucket.aggregations()
                .get("region_agg").reverseNested()  // 최상위로 복귀
                .aggregations()
                .get("region_terms").sterms()
                .buckets().array()
                .stream()
                .map(rb -> new RegionStats(rb.key(), rb.docCount()))
                .toList();

            return new RatingStats(rating, count, regions);
        })
        .toList();
}

5. 실무 적용 시 배운 점

5.1 데이터 모델링이 집계 복잡도를 결정한다

Nested depth가 깊어질수록 쿼리와 결과 처리가 기하급수적으로 복잡해진다. 가능하면:

Nested depth를 2단계 이하로 유지
정말 관계를 유지해야 하는 경우만 nested 사용
대안으로 denormalization(비정규화) 고려

5.2 OpenSearchDashboard와 같은 툴을 이용해 먼저 테스트하기

Java 코드로 바로 작성하지 말고, OpenSearchDashboard의 Dev Tools에서 JSON 쿼리로 먼저 검증한다:

GET /threats/_search
{
  "size": 0,
  "aggs": {
    "galaxy_agg": {
      "nested": {"path": "galaxy"},
      "aggs": {
        "cluster_agg": {
          "nested": {"path": "galaxy.galaxyCluster"},
          "aggs": {
            "patterns": {
              "terms": {"field": "galaxy.galaxyCluster.value"},
              "aggs": {
                "back_to_doc": {
                  "reverse_nested": {},
                  "aggs": {
                    "countries": {
                      "terms": {"field": "source_country"}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

JSON이 복잡해 보이면, 쿼리가 복잡한 것이다. 더 단순한 방법을 찾아보자.

5.3 메소드 체이닝의 가독성

이중 nested + reverse nested 결과 처리는 메소드 체이닝이 길어진다. 가독성을 위해:

// ❌ 나쁜 예
List<String> countries = response.aggregations().get("a").nested().aggregations().get("b").nested().aggregations().get("c").filter().aggregations().get("d").sterms().buckets().array().stream().map(b -> b.key()).toList();

// ✅ 좋은 예
List<String> countries = response.aggregations()
    .get("galaxy_agg").nested()
    .aggregations().get("cluster_agg").nested()
    .aggregations().get("patterns").filter()
    .aggregations().get("country_terms").sterms()
    .buckets().array()
    .stream()
    .map(StringTermsBucket::key)
    .toList();

// ✅ 더 좋은 예: 헬퍼 메소드 분리
List<String> countries = extractCountries(response.aggregations());

private List<String> extractCountries(Map<String, Aggregate> aggs) {
    return Optional.ofNullable(aggs.get("galaxy_agg"))
        .map(a -> a.nested().aggregations().get("cluster_agg"))
        .map(a -> a.nested().aggregations().get("patterns"))
        .map(a -> a.filter().aggregations().get("country_terms"))
        .map(a -> a.sterms().buckets().array())
        .orElse(List.of())
        .stream()
        .map(StringTermsBucket::key)
        .toList();
}

📚 참고 자료

'개발하다 > OpenSearch' 카테고리의 다른 글

OpenSearch nested 쿼리와 집계 방식의 차이점 (0)	2025.10.31
OpenSearch의 기본적인 물리적, 논리적 구조 관계를 정리해보았습니다. (5)	2024.11.24

'개발하다/OpenSearch' Related Articles

oguri's garage

OpenSearch 이중 Nested 구조 집계 경험 정리 본문

OpenSearch 이중 Nested 구조 집계 경험 정리

들어가며

핵심 정리

Nested Aggregation

Reverse Nested

이중 Nested + Reverse Nested

1. Nested 타입이 필요한 이유

1.1 일반 배열의 한계: Flatten 문제

1.2 Nested 타입의 해결책

2. Nested Aggregation의 동작 원리

2.1 쿼리 생성: "이렇게 집계해줘"

2.2 결과 처리: "계산된 결과 꺼내기"

2.3쿼리 생성 vs 결과 처리의 차이

3. Reverse Nested: Nested에서 탈출하기

3.1 문제 상황

3.2 Reverse Nested의 해결책

3.3 Reverse Nested 결과 처리

4. 이중 Nested 구조: 복잡도의 증가

4.1 실제 문제 상황

4.2 시행착오 1: 단일 Nested만 사용

4.3시행착오 2: 이중 Nested만 사용

4.4 해결: Reverse Nested 활용

4.5 결과 처리 코드

5. 실무 적용 시 배운 점

5.1 데이터 모델링이 집계 복잡도를 결정한다

5.2 OpenSearchDashboard와 같은 툴을 이용해 먼저 테스트하기

5.3 메소드 체이닝의 가독성

📚 참고 자료

'개발하다 > OpenSearch' 카테고리의 다른 글

티스토리툴바