Commit fa2e6341 authored by Richard Glosner's avatar Richard Glosner
Browse files

Merge branch '101-update-log-format-2' into 'main'

Resolve "Update log format"

Closes #101

See merge request inject/inject-docs!181
parents 37bd7fef 506c5c50
Loading
Loading
Loading
Loading
+64 −1
Original line number Diff line number Diff line
@@ -9,6 +9,10 @@ Bellow is description of the structure and schema of logs.
  - team-id/
    - uploaded_files/
        - files uploaded by team-id and for team-id during the exercise
    - llm-evaluations/
        - email_suggestions.jsonl
        - email_evaluations.jsonl
        - free_form_evaluations.jsonl
    - inject_states.jsonl
    - questionnaire_states.jsonl
    - action_logs.jsonl
@@ -29,6 +33,7 @@ Bellow is description of the structure and schema of logs.
  - exercise_injects.jsonl
  - exercise_channels.jsonl
  - email_participants.jsonl
  - llm_assessments.jsonl
```

## Description and format of individual files
@@ -207,6 +212,7 @@ Each object has the following format:
#### _Free-form_ question details

- **related_milestone_ids**: _list of int_
- **assessment_id**: _optional int_ - id of the llm assessment

#### _Auto-free-form_ question details

@@ -268,6 +274,7 @@ Each object has the following format:
    - **team_visible**: _boolean_
    - **organization**: _string_
    - **control**: _control_
    - **assessment_id**: _optional int_ - id of the llm assessment

### file_infos.jsonl

@@ -281,6 +288,15 @@ Each object has the following format:
- **uploaded_at**: _optional timestamp_ - timestamp when this file was uploaded, null if the file
    was not uploaded

### llm_assessments.jsonl

Contains all LLM assessments for this exercise.
Each object has the following format:

- **assessment_id**: _int_ - id of the assessment
- **persona**: _string_
- **assessment**: _string_

## Individual team files

### inject_states.jsonl
@@ -316,6 +332,9 @@ Each object has the following format:
    - **attempt**: _int_ - attempt number of this submission
    - **accepted**: _boolean_ - flag that determines whether this submission was accepted by the
        platform, controlled by the repeatable field on the questionnaire
    - **correct**: _(`Correct`, `Incorrect`, `Partially Correct`, `Unknown`)_ -
        correctness of the whole submission,
        excluding questions which cannot be automatically evaluated

### milestones.jsonl

@@ -353,7 +372,7 @@ Each object has the following format:

- **action_log_id**: _int_ - id of the action log
- **type**: _(`Inject`, `Custom Inject`, `Tool`, `Email`, `Form`, `Form Submission`, `Form Review`,
    `Confirmation`, `File Download`, `Milestone Modification`)_ - type of the action log
    `Confirmation`, `File Download`, `Milestone Modification`, `Sandbox Log`)_ - type of the action log
- **timestamp**: _timestamp_ - time when this action log was created
- **channel_id**: _int_ - id of the channel this action log was sent to
- **instructor_comment**: _optional instructor comment_
@@ -422,6 +441,50 @@ This object currently contains no additional fields.
- **cause**: _(0, 2, 4)_ - the cause for this milestone modification, 0 for trainee action, 2 for
    instructor action, 4 for automatic action

#### _Sandbox Log_ details

- **cmd**: _string_ – the executed command
- **cmd_source**: _string_ – the source of the log within the container (i.e., Filebeat)
- **working_directory**: _string_ – the directory within the container in which the command was executed
- **username**: _string_ – the user by whom the command was executed within the container
- **container**: _string_ – the name of the container in which the command was executed

### email_suggestions.jsonl

Contains all email suggestions generated for this team.
Each object has the following format:

- **suggestion_id**: _int_ - id of the suggestion
- **thread_id**: _int_ - id of the thread
- **trigger_email_id**: _int_ - id of the email that the suggestion is responding to
- **email_participant_id**: _int_ - id of the definition participant
    the LLM suggests to respond as
- **response**: _string_ - the suggested text
- **created_at**: _timestamp_ - time when this suggestion was created

### email_evaluations.jsonl

Contains all email evaluations generated for this team.
Each object has the following format:

- **evaluation_id**: _int_ - id of the email evaluation
- **action_log_id**: _int_ - id of the email action log
- **assessment_id**: _int_ - id of the llm assessment
- **response**: _string_ - the text of the evaluation
- **created_at**: _timestamp_ - time when this evaluation was created

### free_form_evaluations.jsonl

Contains all free form evaluations generated for this team.
Each object has the following format:

- **evaluation_id**: _int_ - id of the email evaluation
- **submission_id**: _int_ - id of the questionnaire submission
- **question_id**: _int_ - id of the free-form question
- **assessment_id**: _int_ - id of the llm assessment
- **response**: _string_ - the text of the evaluation
- **created_at**: _timestamp_ - time when this evaluation was created

## Comparing logs from multiple exercises

The logs are constructed in a way that should allow for simple comparison of logs from multiple