본문 바로가기

전체 글

(25)
[NLP] RoBERTa: A Robustly Optimized BERT Pretraining Approach
[GNN] Node2Vec: Scalable Feature Learning for Networks 0. Abstract Previsous Study predction task -> feature를 학습시키는 과정에서 많은 발전 BUT connectivity pattern을 제대로 학습 못함 This Paper low dimensional feature learning & maximizing likelihoods of nodes flexible notion of nodes ("neighborhood node" 너무 엄격한 구분 X) 1. Node2Vec semi supervisded algorithm supervised : expensive for real-word unsupervised : generalize X SGD motivation #Flexibility #Generalization u, s6..
[GNN] Deepwalk : online learning of social representations https://arxiv.org/pdf/1403.6652.pdf 0. Main Idea Graph data --> low-dim dense representation "Embedding" Graph --> NLP method --> Embedding Graph --> Random pattern --> create NL sequence (random walk sequence) random walk sequence --> Skip-Gram algorithm --> Node Embedding 1. Random walk Deep Walk : 그래프에서 sequence를 생성해, 자연어처리의 Skip-Gram 방식으로 임베딩을 학습 이 때, “그래프에서 sequence를 생성”하는 과정 = Random Walk ..
[NLP] ERNIE: Enhanced Representation through Knowledge Integration https://arxiv.org/abs/1904.09223
[DACON] 잡케어 추천 알고리즘 Part3 5. Experiment1 : Categorcal / Ordinal Value Simple / Complex Value 로 나눈 후 예측값들로 새로운 DataSet from sklearn.model_selection import KFold, cross_val_score from sklearn.model_selection import GridSearchCV, train_test_split from sklearn.preprocessing import OneHotEncoder, LabelEncoder import pandas as pd complex_lst=[8, 9, 10, 13, 14, 15, 22, 23, 25] x = train.iloc[:, :-1] y = train.iloc[:, -1] x.appl..
[NLP] GPT-2 : Language Models are Unsupervised Multitask Learners https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 0. Abstract zero-shot setting BUT good performance & underfits WebText learn to perform tasks from their naturally occurring demonstrations -> promising! 1. Introduction current ML -> sensitive to data dist' -> narrow experts GPT-2 = more general systems which can perform many tasks #Multitask l..
[Project] E2E_OCR Model Architecture 보호되어 있는 글입니다.
[CODE] data image , annotation 형식 맞추기 def main(db_fname): print('START!') db = h5py.File(db_fname, 'r') dsets = sorted(db['data'].keys()) print ("total number of images : ", colorize(Color.RED, len(dsets), highlight=True)) print() for num ,k in enumerate(dsets): #add txt f = open('{}/gt_img_{}.txt'.format(GT_PATH,num),'w') rgb = db['data'][k][...] charBB = db['data'][k].attrs['charBB'] wordBB = db['data'][k].attrs['wordBB'] txt = db..